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NOTES FOR CMOS DEVICES 


@ PRECAUTION AGAINST ESD FOR SEMICONDUCTORS 

Note: 

Strong electric field, when exposed to a MOS device, can cause destruction of the gate oxide and 
ultimately degrade the device operation. Steps must be taken to stop generation of static electricity 
as much as possible, and quickly dissipate it once, when it has occurred. Environmental control 
must be adequate. When it is dry, humidifier should be used. It is recommended to avoid using 
insulators that easily build static electricity. Semiconductor devices must be stored and transported 
in an anti-static container, static shielding bag or conductive material. All test and measurement 
tools including work bench and floor should be grounded. The operator should be grounded using 
wrist strap. Semiconductor devices must not be touched with bare hands. Similar precautions need 
to be taken for PW boards with semiconductor devices on it. 


HANDLING OF UNUSED INPUT PINS FOR CMOS 

Note: 

No connection for CMOS device inputs can be cause of malfunction. If no connection is provided 
to the input pins, itis possible that an internal input level may be generated due to noise, etc., hence 
causing malfunction. CMOS devices behave differently than Bipolar or NMOS devices. Input levels 
of CMOS devices must be fixed high or low by using a pull-up or pull-down circuitry. Each unused 
pin should be connected to Vpop or GND with a resistor, if it is considered to have a possibility of 
being an output pin. All handling related to the unused pins must be judged device by device and 
related specifications governing the devices. 


STATUS BEFORE INITIALIZATION OF MOS DEVICES 

Note: 

Power-on does not necessarily define initial status of MOS device. Production process of MOS 
does not define the initial operation status of the device. Immediately after the power source is 
turned ON, the devices with reset function have not yet been initialized. Hence, power-on does 
not guarantee out-pin levels, I/O settings or contents of registers. Device is not initialized until the 


reset signal is received. Reset operation must be executed immediately after power-on for devices 


having reset function. 


Vr Series, Vp4300 Series, VR3000, Vp 4000, Vp4100, VR 4200, VR4300, Vp 4305, Vp4310, and Vp4400 are 
trademarks of NEC Corporation. 

UNIX is a registered trademark licensed by X/Open Company Limited in the US and other couniries. 
MC68000 is a trademark of Motorola Inc. 

IBM370 is a trademark of International Business Machines Corporation. 

iAPX is a trademark of Intel Corporation. 

DEC VAX is a trademark of Digital Equipment Corporation. 

MIPS is a registered trademark of MIPS Technologies, Inc. in the U.S.A. 
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Exporting this product or equipment that includes this product may require a governmental license from the U.S.A. for some 
countries because this product utilizes technologies limited by the export control regulations of the U.S.A. 


The information in this document is current as of October, 1999. The information is subject to 
change without notice. For actual design-in, refer to the latest publications of NEC's data sheets or 
data books, etc., for the most up-to-date specifications of NEC semiconductor products. Not all 
products and/or types are available in every country. Please check with an NEC sales representative 
for availability and additional information. 

No part of this document may be copied or reproduced in any form or by any means without prior 

written consent of NEC. NEC assumes no responsibility for any errors that may appear in this document. 

NEC does not assume any liability for infringement of patents, copyrights or other intellectual property rights of 

third parties by or arising from the use of NEC semiconductor products listed in this document or any other 
liability arising from the use of such products. No license, express, implied or otherwise, is granted under any 
patents, copyrights or other intellectual property rights of NEC or others. 

Descriptions of circuits, software and other related information in this document are provided for illustrative 

purposes in semiconductor product operation and application examples. The incorporation of these 

circuits, software and information in the design of customer's equipment shall be done under the full 
responsibility of customer. NEC assumes no responsibility for any losses incurred by customers or third 
parties arising from the use of these circuits, software and information. 

While NEC endeavours to enhance the quality, reliability and safety of NEC semiconductor products, customers 
agree and acknowledge that the possibility of defects thereof cannot be eliminated entirely. To minimize 
risks of damage to property or injury (including death) to persons arising from defects in NEC 
semiconductor products, customers must incorporate sufficient safety measures in their design, such as 
redundancy, fire-containment, and anti-failure features. 

NEC semiconductor products are classified into the following three quality grades: 

"Standard", "Special" and "Specific". The "Specific" quality grade applies only to semiconductor products 

developed based on a customer-designated "quality assurance program" for a specific application. The 
recommended applications of a semiconductor product depend on its quality grade, as indicated below. 

Customers must check the quality grade of each semiconductor product before using it in a particular 

application. 

"Standard": Computers, office equipment, communications equipment, test and measurement equipment, audio 
and visual equipment, home electronic appliances, machine tools, personal electronic equipment 
and industrial robots 

"Special": Transportation equipment (automobiles, trains, ships, etc.), traffic control systems, anti-disaster 
systems, anti-crime systems, safety equipment and medical equipment (not specifically designed 
for life support) 

"Specific": Aircraft, aerospace equipment, submersible repeaters, nuclear reactor control systems, life 
support systems and medical equipment for life support, etc. 

The quality grade of NEC semiconductor products is "Standard" unless otherwise expressly specified in NEC's 
data sheets or data books, etc. If customers wish to use NEC semiconductor products in applications not 
intended by NEC, they must contact an NEC sales representative in advance to determine NEC's willingness 
to support a given application. 

(Note) 

(1) "NEC" as used in this statement means NEC Corporation and also includes its majority-owned subsidiaries. 

(2) "NEC semiconductor products" means any semiconductor product developed or manufactured by or for 

NEC (as defined above). 
M8E 00.4 
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Regional Information 


Some information contained in this document may vary from country to country. Before using any NEC 
product in your application, please contact the NEC office in your country to obtain a list of authorized 
representatives and distributors. They will verify: 


¢ Device availability 
¢ Ordering information 


¢ Product release schedule 


¢ Availability of related technical literature 


¢ Development environment specifications (for example, specifications for third-party tools and 
components, host computers, power plugs, AC supply voltages, and so forth) 


¢ Network requirements 


In addition, trademarks, registered trademarks, export restrictions, and other legal issues may also vary 


from country to country. 


NEC Electronics Inc. (U.S.) 

Santa Clara, California 

Tel: 408-588-6000 
800-366-9782 

Fax: 408-588-6130 
800-729-9288 


NEC Electronics (Germany) GmbH 
Duesseldorf, Germany 

Tel: 0211-65 03 02 

Fax: 0211-65 03 490 


NEC Electronics (UK) Ltd. 
Milton Keynes, UK 

Tel: 01908-691-133 

Fax: 01908-670-290 


NEC Electronics Italiana s.r.l. 
Milano, Italy 

Tel: 02-66 75 41 

Fax: 02-66 75 42 99 


NEC Electronics (Germany) GmbH 
Benelux Office 

Eindhoven, The Netherlands 

Tel: 040-2445845 

Fax: 040-2444580 


NEC Electronics (France) S.A. 
Velizy-Villacoublay, France 

Tel: 01-30-67 58 00 

Fax: 01-30-67 58 99 


NEC Electronics (France) S.A. 
Madrid Office 

Madrid, Spain 

Tel: 91-504-2787 

Fax: 91-504-2860 


NEC Electronics (Germany) GmbH 
Scandinavia Office 

Taeby, Sweden 

Tel: 08-63 80 820 

Fax: 08-63 80 388 
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NEC Electronics Hong Kong Ltd. 
Hong Kong 

Tel: 2886-9318 

Fax: 2886-9022/9044 


NEC Electronics Hong Kong Ltd. 
Seoul Branch 

Seoul, Korea 

Tel: 02-528-0303 

Fax: 02-528-441 1 


NEC Electronics Singapore Pte. Ltd. 
United Square, Singapore 

Tel: 65-253-831 1 

Fax: 65-250-3583 


NEC Electronics Taiwan Ltd. 
Taipei, Taiwan 

Tel: 02-2719-2377 

Fax: 02-2719-5951 


NEC do Brasil S.A. 
Electron Devices Division 
Guarulhos-SP Brasil 

Tel: 55-11-6462-6810 
Fax: 55-11-6462-6829 


J00.7 


Major Revisions in This Edition 


Page Description 
p.33 1.1 Characteristics Correction of description 
p.35 1.4.1 Internal Block Configuration Correction of description 
p.166 6.3.5 Status Register (12) Correction of description 
p.198 6.4.17 Watch Exception Correction and addition of description 
p.244 8.2.7 Unimplemented Operation Exception (E) Addition of description 
p.254 9.3.1 Power Modes Correction of description 
pp.259, 260 | 10.2 Basic System Clocks Correction of description 
p.264 10.4 Low Power Mode Operation Correction of description 
p.360 15.1 Features Correction of description 
p.360 15.1.2 Low Power Mode Correction of description 

17.5 FPU Instructions Addition of description to the following instructions 

p.568 CEIL.L.fmt 
p.570 CEIL.W.fmt 
p.574 CVT.D.fmt 
p.576 CVT.L.fmt 
p.578 CVT.S.fmt 
p.580 CVT.W.fmt 
p.587 FLOOR.L.fmt 
p.589 FLOOR.W.fmt 
p.600 ROUND.L.fmt 
p.602 ROUND.W.fmt 
p.610 TRUNC.L.fmt 
p.612 TRUNC.W.fmt 
p.628 Table A-1 Differences Between the Vp4300, Vp4305, and Vp4310 Correction of description 
p.630 B.1.3 Status Register Correction of description 
p.632 Table B-1 Differences in Software Correction of description 
p.634 B.2.2 System Interface Correction of description 
p.635 Table B-2 Differences in System Design Correction of description 
p.639 Table B-3 Other Differences Correction of description 
p.644 C.2.2 Clock Correction of description 
pp.647, 648 | Appendix D Restrictions of Vp4300 Addition 


The mark % shows major revised points. 


User's Manual U10504EJ7VOUMO00 


Readers 


Purpose 


Organization 


How to read this manual 


PREFACE 


This manual targets users who intends to understand the functions of 
the Vp4300, Vp4305 (uPD30200, Vp4310 (uPD30210) and to design 
application systems using this microprocessor. 


This manual introduces the architecture functions of the Vp4300, 
VRp4305, and Vp4310 to users, following the organization described 
below. 


This manual consists of the following contents: 
¢ Introduction 
e Pipeline operation 
e Memory management system and cache 
e Exception processing 
e Floating-point operation 
¢ Hardware 
¢ — Instruction set details 


It is assumed that the readers of this manual has a general knowledge 
of electric engineering, logic circuits, and microcomputers. 


Unless otherwise specified, Vp4300 is described as a representative 
product in this manual. When using this manual as that for Vp4305 or 
VR4310, read as follows. 


V 24300 > Vp4305 
VR4300 > Vp4310 


The VR4400™ in this manual represents the VR4000™. 
The Vp4000 series in this manual represents the VR4100™, 
VRp4200™, V24300, Vp4305, Vp4310, and Vp4400. 


To learn about detailed function of a specific instruction, 

— Refer to Chapter 3 CPU Instruction Set Summary, Chapter 7 
Floating-Point Operations, and Chapter 17 FPU Instruction 
Set Details. 
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To learn about the overall functions of the Vp4300, 
— Read this manual in sequential order. 


To learn about electrical specifications of the Vp4300, 
— Refer to the data sheet which is separately available. 


Conventions Data significance: Higher digits on the left and lower digits on 
the right 

Active low xxx (Overscore over pin or signal name) 
representation: 
*: Footnote for item marked with * in the text 
Caution: Information requiring particular attention 
Remark: Supplementary information 
Numerical binary or decimal ... xxxx 
representation: hexadecimal ........... Oxxxxx 


Prefixes indicating power of 2 (address space, memory capacity): 
K (kilo) 2!°= 1024 
M (mega) 27° = 1024? 
G (giga) 23° = 10243 
T (tera) 24° = 10244 
P (peta) 2°° = 1024° 
E(exa) 2° = 1024° 


Related documents See also the following documents. 
The related documents indicated in this publication may include 
preliminary versions. However, preliminary versions are not marked 


as such. 
Document Name Document Number 
VR4300, Vp4305, Vp4310 User’s Manual This manual 
uPD30200, 30210 Data Sheet U10116E 
Vp Series Application Note - Programming Guide U10710E 
VR4000 Series Application Note - Simulation Guide U11788J (Japanese only) 
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General 


This chapter outlines the RISC 64-bit microprocessor Vp4300, Vp4305 
(uPD30200), and Vp4310 (uPD30210). 
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1.1 Characteristics 


32 


The Vp4300, Vp4305, and Vp4310 are members of the NEC Vp Series™ RISC 
(Reduced Instruction Set Computer) microprocessors and is a high-performance 
64-bit microprocessor employing the RISC architecture developed by MIP 


si™. 


Its instructions are upward-compatible with the instructions of the VR3000!™ 
Series and are completely compatible with those of the Vp4400 and Vp4200. 
Therefore, existing applications can be used as is with the Vp4300, Vp4305, and 


The Vp4300, Vp4305, and Vp4310 have the following features: 


Internal operating frequency: 
80 MHz max. (uPD30200-80), 
100 MHz max. (uPD30200-100), 
133 MHz max. (uPD30200-133, 30210-133), 
167 MHz max. (uPD30210-167) 
64-bit architecture supporting 64-bit data processing 
Optimized, 5-stage pipeline processing 
High-speed translation lookaside buffer (TLB) supporting virtual 
addresses (of 32 double entries) 
Address space _ Physical: 32 bits 
Virtual: 40 bits (64-bit mode) 
31 bits (32-bit mode) 
Supports single-precision and double-precision floating-point 
operations 
On-chip cache memories 
Instruction: 16 KB 
Data: 8 KB 
Employs write back cache system — store operation via system bus 
decreased 
32-bit external bus interface facilitating system development 
Multiplies external operating frequency (input clock and bus 
interface) to create internal operating frequency. 
Multiple is selected on power application 
(uPD30200-80: x1, x2, or x3) 
(uPD30200-100: x1.5, x2, or x3) 
(uPD30200-133: x2, x3, or x4) 
(uPD30210-133: x2, x2.5, x3, or x4) 
(uPD30210-167: x2, x2.5, x3, x4, x5, or x6) 
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e Write buffer 


¢ Low power mode (uPD30200-80, 30200-100 only) 
Reduces internal and system bus clocks to 1/4 of normal level. Also 
reduces power consumption 

¢ Software-compatible with Vp4400 and Vp4200 and upward- 
compatible with Vp3000 Series 

e Supply voltage: 3.3 V + 0.3 V (uPD30200-80, 30200-100), 3.0 to 3.5 
V (uPD30200-133, 30210-xxx) 


1.2 Ordering Information 


Maximum Operating 


Part Number Package Frequency (MHz) 
UPD30200GD-80-LBB 120-pin plastic QFP (28 x 28 mm) 80 
UPD30200GD-100-MBB 120-pin plastic QFP (28 x 28 mm) 100 
UPD30200GD-133-MBB 120-pin plastic QFP (28 x 28 mm) 133 
uUPD30210GD-133-MBB 120-pin plastic QFP (28 x 28 mm) 133 
uUPD30210GD-167-MBB 120-pin plastic QFP (28 x 28 mm) 167 


1.3 64-Bit Architecture 


The Vp4300 is a 64-bit high-performance microprocessor. It can also execute 32- 
bit applications even when it operates as a 64-bit microprocessor. 


1.4 Vp4300 Processor 


Figure 1-1 shows the internal block diagram of the Vp4300. 

The Vp4300 is equipped with a full-associative high-speed translation lookaside 
buffer (TLB) that has 32 entries with two pages corresponding to each entry; data 
cache and instruction cache; and FPU, in addition to a high-performance integer 
operation unit. 
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Data/Address Control MasterClock 
System Clock Generator 
Interface 


Instruction Cache Data Cache 


CPO TLB 


! roy 


Execution Unit 


Instruction Address 


Pipeline Control 


Figure 1-1 Internal Block Diagram 
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1.4.1 Internal Block Configuration 


Product Name 


System Interface allows the processor to access external resources such as 
memories. It contains a 32-bit multiplexed address/data bus, with per-byte parity, 
clock signals, interrupt request signals, and various control signals. It is not 
compatible with the System interface bus used on the Vp4400 and Vp4200. 


Clock Generator generates a pipeline clock (PClock) based on an externally 
input clock (MasterClock). The frequency of the PClock can be selected by 
setting the frequency ratio between the MasterClock and the PClock. This ratio 
is set using the DivMode pins on power application. (For setting of the DivMode 
pins, refer to Table 2-2 Clock/Control Interface Signals.) Table 1-1 indicates 
the selectable frequency ratio. System interface clock (SClock) usually has the 
same frequency as the MasterClock. 


Table 1-1 Frequency Ratio Between PClock and MasterClock 


DivMode Pin Selectable Frequency Ratio (MasterClock : PClock) 


VR4300 DivMode (1:0) | 1:1.5°!,1:2,1:3,1:4° 
VR4305 DivMode (1:0) | 1:1,1:2,1:3 
VR4310 DivMode (2:0) | 1:2,1:2.5°9,1:3,1:4,1:5,1:6 


*1. Selectable with the 100 MHz model only (With the 133 MHz model, this setting is reserved.) 
2. Selectable with the 133 MHz model only (With the 100 MHz model, this setting is reserved.) 
3. Selectable with the 167 MHz model only (With the 133 MHz model, this setting is reserved.) 


If the RP bit of the Status register is set to 1 during operation, the frequencies of 
the PClock and SClock can be reduced to 1/4 of the normal frequency . Because 
the PLL (Phase-Locked Loop) technique is employed, the skew (phase difference) 
between the external clock and internal operation clock can be minimized. 


* 100 MHz model of the Vp4300 and the Vp4305 only 


Instruction Cache is direct-mapped, virtually-indexed, and physically-tagged. 
The capacity is 16 KB. 


Execution Unit has the hardware resources to execute integer and floating-point 
instructions. It has a 64-bit register file, 64-bit integer/mantissa datapath, and 12- 
bit exponent datapath. It is provided with a dedicated multiplexer in order to 
process multiply instruction at a high speed. 
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Coprocessor 0 (CP0) has the memory management unit (MMU) and handles 
exception processing. The MMU handles address translation and checks memory 
accesses that occur between different memory segments (user, supervisor, or 
kernel). The translation lookaside buffer (TLB) is used to translate virtual to 
physical addresses. 


Data Cache is a direct-mapped, virtually-indexed and physically-tagged write- 
back cache. The capacity is 8 KB. 


Instruction Address calculates the effective address of the next instruction to be 
fetched. It contains the incrementer for the Program Counter (PC), the target 
address adder, and the conditional branch address selector. 


Pipeline Control ensures the instruction pipeline operates properly (should one 
of the following conditions occur: pipeline stall or exception). 
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General 


The processor provides the following registers: 


32 64-bit general purpose registers, GPRs 
32 64-bit floating-point operation registers, /PRs 


In addition, the processor provides the following special registers: 


64-bit Program Counter, the PC register 


64-bit H/ register, containing the integer multiply and divide high- 
order doubleword result 


64-bit LO register, containing the integer multiply and divide low- 
order doubleword result 


1-bit Load/Link LLBit register 
32-bit floating-point Implementation/Revision register, FCRO 
32-bit floating-point Control/Status register, FCR31 


Two of the General Purpose registers have assigned functions: 


r0 is hardwired to a value of zero, and can be used as the target 
register for any instruction whose result is to be discarded. r0 can 
also be used as a source when a zero value is needed. 


r31 is the link register used by JAL and JALR instructions. It can be 


used by other instructions. Make sure that other data used in 


calculations does not overlap with the register used by the JAL/JALR 


instruction. 


Furthermore, the processor contains registers in the system control processor 
(CPO) which perform the exception processing and address management. 


CPU registers can operate as either 32-bit or 64-bit registers, depending on the 
VR4300 processor mode of operation. 


Figure 1-2 shows the CPU registers. 
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General Purpose Registers 
63 


r0 =0 


r30 


r31 = Link address 


Floating-Point Registers 
63 


Multiply and Divide Registers 


63 0 
ll 
63 0 


Load/Link Register 
0 


Floating-Point Control Registers 


31 0 
r0 = Implementation/Revision J 
31 0 


r31 = Control/Status 


Figure 1-2. CPU Registers 


The Vp4300 processor has no Program Status Word (PSW) register as such; this 
is covered by the Status and Cause registers incorporated within the System 
Control Coprocessor (CPO). For CPO registers, refer to 1.4.5 System Control 


Coprocessor (CP0). 
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1.4.3 CPU Instruction Set Overview 


Each CPU instruction is 32 bits long. As shown in Figure 1-3, there are three 
instruction formats: 


* immediate (I-type) 
* jump (J-type) 
* register (R-type) 


31 26 25 2120 1615 0 
|-Type (Immediate) op rs rt immediate 

31 26 25 0 
J-Type (Jump) op target 

31 26 25 2120 1615 1110 65 0 
R-Type (Register) op rs rt rd sa | funct 


Figure 1-3 CPU Instruction Formats 


The instruction set can be further divided into the following groupings: 


¢ Load and Store instructions move data between memory and general 
purpose registers. They are all immediate (I-type) instructions, since 
the only addressing mode supported is base register plus 16-bit, 
signed immediate offset. 


¢ Computational instructions perform arithmetic, logical, shift, 
multiply, and divide operations on values in registers. They include 
register (R-type, in which both the operands and the result are stored 
in registers) and immediate (I-type, in which one operand is a 16-bit 
signed immediate value) formats. 


¢ Jump and Branch instructions change the control flow of a program. 
Jumps are always made to an address formed by combining a 26-bit 
target address with the high-order bits of the Program Counter (J-type 
format) or register address (R-type format). Branch instructions are 
performed to the 16-bit offset address relative to the program counter 
(I-type). Jump And Link instructions save their return address in 
register 31. 
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¢ Coprocessor instructions (CPz) perform operations in the 
coprocessors. Coprocessor load and store instructions are 
I-type. As opposed to CPO instructions, CPz instructions are not 
specific to any coprocessor. (Refer to Chapter 7 Floating-Point 
Operations.) 


¢ Coprocessor 0 (system coprocessor, CPO) instructions perform 
operations on CPO registers to control the memory-management and 
exception-handling facilities of the processor. 


e Special instructions perform system call exception and breakpoint 
exception operations, or cause a branch to the general exception- 
handling vector based upon the result of a comparison. These 
instructions occur in both R-type (both the operands and the result 
are registers) and I-type (one operand is a 16-bit immediate value) 
formats. 


For each instruction, refer to Chapter 3 CPU Instruction Set Summary and 
Chapter 16 CPU Instruction Set Details. 
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1.4.4 Data Formats and Addressing 


The Vp4300 processor uses four data formats: a 64-bit doubleword, a 32-bit word, 
a 16-bit halfword, and an 8-bit byte. Byte ordering within all of the larger data 
formats—halfword, word, doubleword—can be configured in either big-endian or 
little-endian. When the Vp4300 processor is configured as a big-endian system, 
byte 0 is the most-significant (leftmost) byte, thereby providing compatibility with 
Mc 68000/™ and IBM 370/™ conventions. Figure 1-4 shows this configuration. 


Higher Word 
Address Address 31 24 23 1615 87 0 
12 12 13 14 15 
ir 8 8 9 10 11 
oo 4 4 5 6 v 
Lower 0 0 { p) 3 
Address 


Figure 1-4 Big-Endian Byte Ordering 


Remarks 1. The most-significant byte is the lowest address. 
2. A word is addressed by the address of the most-significant byte. 


When configured as a little-endian system, byte 0 is always the least-significant 
(rightmost) byte, which is compatible with iAPX!™ xg6 and DEC VAX'™™ 
conventions. Figure 1-5 shows this configuration. 


Unless otherwise specified, the little endian is used throughout this manual. 


Higher Word 
Address Address 31 24 23 1615 87 0 
12 15 14 13 12 
| 8 11 10 
4 7 6 5 
Lower 0 3 2 1 0 
Address 


Figure 1-5 Little-Endian Byte Ordering 


Remarks 1. The least-significant byte is the lowest address. 
2. A word is addressed by the address of the least-significant byte. 
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. Word Halfword Byte 
Higher —Doubleword Nl y 
Address Address [63 32ll31 16115 8i7 

ie 16 16 17 18 19 20 21 22 23 

[| 8 8 9 10 11 12 13 14 15 
Lower 0 0 1 2 3 4 5 6 7 
Address 


Figure 1-6 Big-Endian Data in a Doubleword 


Remarks 1. The most-significant byte is the lowest address. 
2. A word is addressed by the address of the most-significant byte. 


: Word Halfword Byte 
Higher PDoubleword Fr , 
Address Address [63 3all31 16115 8i7 

{\ 16 23 22 21 20 19 18 17 16 

[| 8 15 14 13 12 11 10 
Lower 0 7 6 5 4 8} 2 1 0 
Address 


Figure 1-7 Little-Endian Data in a Doubleword 


Remarks 1. The least-significant byte is the lowest address. 
2. A word is addressed by the address of the least-significant byte. 
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The CPU uses byte addressing for halfword, word, and doubleword accesses with 
the following alignment constraints: 


¢ Halfword accesses must be aligned on an even byte boundary (0, 2, 
4...). 


¢ Word accesses must be aligned on a byte boundary divisible by four 
(0, 4, 8...). 
¢ Doubleword accesses must be aligned on a byte boundary divisible 


by eight (0, 8, 16...). 


The following special instructions load and store words that are not aligned on 4- 
byte (word) or 8-word (doubleword) boundaries: 
LWL LWR SWL SWR 


LDL LDR SDL SDR 


These instructions are always used in pairs to access data not aligned at an 
boundary. To access data not aligned at a boundary, additional 1P cycle is 
necessary as compared when accessing data aligned at a boundary. 


Figure 1-8 illustrates how a word misaligned and having byte address 3 is 
accessed in big and little endian. 


Higher 
Address 


{\ 31 24 23 1615 8 7 0 
4 5 6 


Big-Endian 


Lower 
Address 


Higher 
Address 


\ 31 24 23 1615 8 7 0 
| 6 5 4 Little-Endian 


Lower 
Address 


Figure 1-8 Misaligned Word Addressing 
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1.4.5 System Control Coprocessor (CPO) 
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ISA of MIPS defines four types of coprocessors (CPO through CP3). CPO is an 
internal system control coprocessor and supports a virtual memory system and 
exception processing. CP1 is an internal floating-point unit. CP2 is reserved for 
future definition. CP3 is also reserved for expansion. If the CP3 instruction is 
executed, a reserved instruction exception occurs. 


CPO converts virtual addresses into physical addresses, selects an operating mode 
(Kernel, supervisor, or user mode), and control exceptions. It also controls the 
cache subsystem to analyze causes and return execution from error processing. 
The CPO register of the Vp4300 is the same as that of the Vp4200. Because the 
VR4300 does not have a parity check function, however, its parity error register 
(26) and cache error register (27) do not practically operate. These registers are 
defined to maintain compatibility with the Vp4200. 


Figure 1-9 shows the CPO register. Table 1-2 briefly explains each register. For 
the details of the registers related to the virtual memory system, refer to Chapter 
5 Memory Management System, and for the details of the registers used for 
exception processing, refer to Chapter 6 Exception Processing. 
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Register Name 


Index 


Random 


EntryLoO 


EntryLol 


Context 


PageMask 


Wired 


BadVAddr 


Count 


EntryHi 


Compare 


Status 


Cause 


EPC 


PRId 


Memory Management 


General 


Reg. # Register Name Reg. # 
0 Config 16 
1 LLAddr 17 
2 WatchLo 18 
3 WatchHi 19 
4 XContext 20 
5 21 
6 22 
7 23 
& 24 
9 25 
10 Parity Error 26 
Il Cache Error 27 
12 TagLo 28 
13 TagHi 29 
14 ErrorEPC 30 
15 31 


Exception Processing 


Figure 1-9 CPO Registers 
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Table 1-2 System Control Coprocessor (CPO) Register Definitions 
Number Register Description 
0 Index Programmable pointer into TLB array 
1 Random Pseudorandom pointer into TLB array (read only) 
2 EntryLoO Low half of TLB entry for even virtual address (VPN) 
3 EntryLol Low half of TLB entry for odd virtual address (VPN) 
4 Context Pointer to kernel virtual page table entry (PTE) in 32-bit mode 
2 PageMask Page size specification 
6 Wired Number of wired TLB entries 
7 — Reserved for future use 
8 BadV Addr Display of virtual address that occurred an error last 
9 Count Timer Count 
10 EntryHi High half of TLB entry (including ASID) 
11 Compare Timer Compare Value 
12 Status Operation status setting 
13 Cause Display of cause of last exception 
14 EPC Exception Program Counter 
15 PRId Processor Revision Identifier 
16 Config Memory system mode setting 
17 LLAddr Load Linked instruction address display 
18 WatchLo Memory reference trap address low bits 
19 WatchHi Memory reference trap address high bits 
20 XContext Pointer to Kernel virtual PTE table in 64-bit mode 
21-25 _— Reserved for future use 
26 Parity Error" Cache parity bits 
27 Cache Error" Cache Error and Status register 
28 TagLo Cache Tag register low 
29 TagHi Cache Tag register high 
30 ErrorEPC Error Exception Program Counter 
31 —_— Reserved for future use 


* These registers are defined to maintain compatibility with the Vp4200, and not used with the 
hardware of the Vp4300. 
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1.4.6 Floating-Point Unit (FPU), CP1 


The floating-point unit (FPU) operates as a coprocessor for the CPU and performs 
arithmetic operations on floating-point values. The FPU, with associated system 
software, fully conforms to the requirements of ANSI/TEEE Standard 754-1985, 
IEEE Standard for Binary Floating-Point Arithmetic. 


The FPU includes: 


¢ Full 64-bit Operation. The FPU can contain either 16 64-bit 
registers to hold single-precision or double-precision values. Another 
sixteen floating-point registers can be used by setting the FR bit of 
the Status register to 1. Moreover, a 32-bit Control/Status register is 
provided, conforming to the IEEE exception processing standard. 


¢ Load and Store Instruction Set. Like the CPU, the FPU uses a 
load- and store-based instruction set. Floating-point operations are 
started in a single cycle, however execution of floating-point ops are 
not allowed to overlap other operations. 


¢ Sharing Hardware. There is no separate FPU on the Vp4300; 
floating-point operations are processed by the same hardware as is 
used for integer instructions. 


1.4.7 Internal Cache 


The Vp4300 has an instruction cache and a data cache to enhance the efficiency 
of pipelining. Each cache has a data width of 64 bits and can be accessed in | 
clock. The instruction cache and data cache can be accessed in parallel. The 
instruction cache has a capacity of 16K bytes, while the data cache has a capacity 
of 8K bytes. 


For the details of the cache, refer to Chapter 11 Cache Memory. 
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1.5 Memory Management System (MMU) 


The Vp4300 processor has a 32-bit physical addressing range of 4 GB. However, 
since it is rare for systems to implement a physical memory space this large, the 
CPU provides a logical expansion of memory space to the programmer by 
translating addresses into the large virtual address space. The Vp4300 processor 
supports the following two addressing modes: 


¢ 32-bit mode, in which the virtual address space is divided into 2 GB 
per user process and 2 GB for the kernel. 


¢ 64-bit mode, in which the virtual address is expanded to 
1 TB (240 bytes) of user virtual address space. 


A detailed description of these address spaces is given in Chapter 5 Memory 
Management System. 


1.5.1 Translation Lookaside Buffer (TLB) 


Virtual memory mapping is assisted by a translation lookaside buffer, which holds 
virtual-to-physical address translations. This fully-associative, on-chip TLB 
contains 32 entries, each of which maps a pair of variable-sized pages of either 4 
KB or 16 MB. 


Joint TLB (JTLB) 


The TLB can hold both instruction and data addresses, and is thus also referred to 
as a joint TLB (JTLB). 


An address translation value is tagged with the high-order bits of its virtual 
address (the number of these bits depends upon the size of the page) and a per- 
process identifier. If there is no matching entry in the TLB, an exception occurs 
and software writes the entry contents to the on-chip TLB from a page table in 
memory. The JTLB entry to be rewritten is selected by a value in either the 
Random or Index register. 
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Instruction Micro-TLB (ITLB) 


The Vp4300 processor has a two-entry instruction micro-TLB (ITLB) which 
assists in instruction address translation. The ITLB can not be operated directly 
by the software. Instructions access this TLB while data accesses the Joint TLB; 
a miss in the micro-TLB stalls the pipeline until the micro-TLB is refilled from 
the joint TLB. The micro-TLB is fully associative, and uses the least-recently- 
used (LRU) replacement algorithm. Each micro-TLB entry maps 4 KB of virtual 
space to physical space. This ensures each ITLB entry is a subset of any single 
JTLB entry. 


1.5.2 Operating Modes 
The Vp4300 processor has three operating modes: 
¢ User mode 
¢ Supervisor mode 


¢ Kernel mode 


The manner in which memory addresses are translated or mapped depends on the 
operating mode of the CPU; this is described in Chapter 5 Memory 
Management System. 


1.6 Instruction Pipeline 


The Vp4300 has a 5-stage instruction pipeline. This pipeline is used for floating- 
point operations as well as for integer operations. In a normal environment, the 
pipeline executes one instruction in | cycle. 


The pipeline of the Vp4300 operates at a frequency determined depending on the 
setting of the DivMode(1:0)* pins. For details, refer to Chapter 4 Pipeline. 


* In Vp4300 and Vp4305. In Vp4310, DivMode(2:0). 
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2.1 Pin Configuration (Top View) 


e 120-pin plastic QFP (28 x 28 mm) 
uPD30200GD-80-LBB 
uPD30200GD- 100-MBB 
uPD30200GD- 133-MBB 
uPD30210GD-133-MBB 
uPD30210GD-167-MBB 


o ss 
Q3Ez BE/B3 S bt S$ x 58 
BS, 28 b So CEE so ogee Je 

a Sito ,=O90a iG a Sta Jef a0 

Zz ae QSOZz BS V5 SZ BVIS|G PZ BQ V2 As gz 8 

OFIEHOAHDOFSFARDOVHDOSFOWACHOSHDWHOSQOODZOS 

TELE CETETT ETAT ETT 

Oorow Nr oon ounrn one to 
rrrrr wrrre oooo onm am 
We eee OS OS OOO O'S 


Vpp O 1 O Vpp 
GND 6 2 © GND 
SysAD22 9 3 O © Int2 
SysAD216 4 O SysAD27 
Vpp O 5 O SysAD28 
GNDO 6 OVpp 
SysAD20 © 7 © GND 
VppO 8 © SysAD29 
VppP.O 9 O EOK 
GNDP oO © SysAD30 
PLLCap0 © OVpp 
PiLGee!o one 
ppr © ali 
GNDP Oo O SysAD31 
Vpp (Div Mode2)Q OVpp 
MasterClcok © © GND 
GNDO © PReq 
TClock © O SysADO 
GNBO S on 
O O 
SyncOut ©. O SysAD1 
SysAD19 © O SysAD2 
Vpp O O Vpp 
Syncin © e) a D 
GNDO O SysAD3 
SysAD18 © OJTDO 
SysAD17 © © SysAD4 
Int4 a © JTDI 
Vpp O O Vpp 
GND O © GND 


ot nwo I oor orn Nora 
oo oo t+ ret mwuow Tomtom ole) 
Q anngqg at~MOA ANT Q 2ACISDD]A AGWrWNA AGVHYY¥l Aa a 
Zone ZOorr Zz orrZOoTieaOz OO0OS2 SO00ZO 
Saas >aagaGgr-aamArak¢adhRrae Fatt eE-oF 
(0) 1O} (0) (0) oO FO oO 
Bo 88° 83° 3 3 o8° Ga 
DN DA DN a 


Remark ( ): Pin name of the uPD30210-xxx 
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PIN NAME 
ColdReset 


DivMode (1:0)* 


EOK 

FReq 

EValid 

Int (4:0) 
JTCK 

JTDI 

JTDO 

JTMS 
MasterClock 
NMI 
PLLCap (1:0) 
PMaster 
PReq 

PValid 

Reset 

Syncln 
SyncOut 
SysAD (31:0) 
SysCmd (4:0) 
TClock 


* In the uPD30200- xxx. DivMode (2:0) in the UuPD30210- xxx. 


: Cold Reset 

: Divide Mode 

: External OK 

: External Request 

: External Valid 

: Interrupt Request 

: JTAG Clock Input 

: JTAG Data In 

: JTAG Data Out 

: JTAG Command Signal 

: Master Clock 

: Non-maskable Interrupt Request 
: Phase Locked Loop Capacitance 
: Processor Master 

: Processor Request 

: Processor Valid 

: Reset 

: Synchronization Clock Input 

: Synchronization Clock Output 
: System Address/Data Bus 

: System Command Data ID Bus 
: Transmit Clock 

: Power Supply 

: Ground 

: Vpp for PLL 

: GND for PLL 
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2.2 Pin Functions 


2.2.1 System Interface Signals 


The system interface signals are used when the Vp4300 is connected with an 
external device in the system. Table 2-1 indicates the functions of these signals. 


Table 2-1 System Interface Signals 


Signal Name 


Definition 


VO 


Function 


SysAD(31:0) 


System address/data 1V/O 


bus 


32-bit address/data bus. Used to transmit or 
receive data or address between the 
processor and the external agent. 


SysCmd(4:0) 


System command/data 1/0 


ID bus 


5-bit bus. Used to transfer commands or 
data identifiers between the processor and 
the external agent. 


EReq 


External request 


Input 


Asserted active when the external agent 
requests the processor for the system 
interface. 


PReq 


Processor request Output 


Asserted active when the processor requests 
the external agent for the system interface. 
If a protocol error is detected in the system 
interface, this signal is oscillated in 
synchronization with MasterClock in a 
cycle which is a multiple of SClock. 


EValid 


External agent valid Input 


Asserted active when the external agent 
drives a valid address or valid data onto the 
SysAD bus, and a valid command/data 
identifier is on the SysCmd bus. 


PValid 


Processor valid 


Output 


Asserted active when the processor drives a 
valid address or data onto the SysAD bus, 
and a valid command/data identifier is on 
the SysCmd bus. 


PMaster 


Processor master 


Output 


Asserted active when the processor is the 
master of the system interface bus. 


EOK 


54 


External ready 


Input 


Asserted active when the external agent is 
ready to accept a processor request. 


User’s Manual U10504EJ7VOUM00 


Pin Functions 


2.2.2 Clock/Control Interface Signals 


These interface signals are used to supply or control clocks. Table 2-2 shows the 


functions of the signals. 


Table 2-2. Clock/Control Interface Signals (1/3) 


frequency mode 


Signal Name Definition VO Function 
MasterClock | Master clock Input Inputs the MasterClock from this pin. The internal 
operating speed is determined by the frequency of 
this signal and the contents of the DivMode 
signals. 
TClock Transmit/receive| Output | Outputs the transmit/receive clock at the same 
clock frequency as the MasterClock. 
SyncOut Synchronization | Output | Outputs a synchronization clock. Connect this pin 
clock output to SyncIn. Model the mutual connection between 
TClock and external agent. 
SyncIn Synchronization | Input | Inputs a synchronization clock. 
clock input 
VppP Static Vpp for 7 This pins is static Vpp for the internal PLL circuit. 
PLL 
GNDP Static GND for This pin is static GND for the internal PLL circuit. 
PLL 
PLLCap(1:0) | Adjusting PLL This pin connects a capacitor for adjusting the 
internal PLL circuit of the processor. 
DivMode Internal Indicates the ratio at which the internal PClock is generated 
operating from the MasterClock. 


Normally, the frequency of the TClock is the same as that of 
the MasterClock. 

Do not change the value of these pins after setting the value on 
power application. 

Otherwise, the operation will not guaranteed. 


The following indicates the relationship between the DivMode 
values and frequency ratio of each product. 


Remark The maximum value of PClock is the same as the 
maximum internal operating frequencies of each 
product regardless of the frequency ratio. (Refer to 
1.2 Ordering Information.) 


* VR 4300 
uPD30200-100 
DivMode MasterClock : PClock : TClock 
(1:0) Frequency ratio | Example [MHz] 

00 RFU - 
01 23: 66.7 : 100 : 66.7 
10 e 2 50: 100: 50 
11 23: 33.3 : 100 : 33.3 


User’s Manual U10504EJ7VOUM0O 55 


Chapter 2 


Table 2-2. Clock/Control Interface Signals (2/3) 


56 


Signal Name Definition Vo Function 
DivMode Internal Input | * Vp4300 
operating uPD30200-133 
frequency mode DivMode MasterClock : PClock : TClock 
(1:0) Frequency ratio | Example [MHz] 
00 1:4:1 33.3 : 133 : 33.3 
01 RFU - 
10 LD ea. 66.7 : 133 : 66.7 
11 Lee 44.3 : 133: 44.3 
* VR 4305 
uPD30200-80 
DivMode MasterClock : PClock : TClock 
(1:0) Frequency ratio | Example [MHz] 
00 feds 66.7 : 66.7 : 66.7 
01 RFU - 
10 132.41 40: 80: 40 
11 1:31 20 : 60 : 20 
*VR4310 
uPD30210-133 
DivMode MasterClock : PClock : TClock 
(2:0) Frequency ratio | Example [MHz] 
000 1.541 26.7 : 133 : 26.7 
001 1:6:1 22.2 : 133: 22.2 
010 RFU - 
O11 1:3:1 33.3 : 100 : 33.3 
100 1:4:1 33.3 : 133 : 33.3 
101 RFU - 
110 1 21 50: 100 : 50 
111 1:3:1 33.3 : 100 : 33.3 
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Pin Functions 


Signal Name Definition VO Function 
DivMode Internal Input | * Vp4310 
operating uPD30210-167 
frequency mode 
DivMode MasterClock : PClock : TClock 
250) Frequency ratio | Example [MHz] 
000 1 541 33.3 : 167 : 33.3 
001 1:6:1 27.8 : 167 : 27.8 
010 en ears 66.7 : 167 : 66.7 
O11 1:3:1 33.3 : 100 : 33.3 
100 1:4:1 33.3 : 133 : 33.3 
101 RFU - 
110 L221 50: 100: 50 
111 12:32 33.3 : 100 : 33.3 


2.2.3 Interrupt Interface Signals 


These signals are used by the external device to issue interrupt requests to the 
VRp4300. Table 2-3 shows the functions of these signals. 


Table 2-3 Interrupt Interface Signals 


interrupt 


Signal Name Definition VO Function 
Int(4:0) Interrupt request Input | General purpose interrupt request pins. 
acknowledge These pins are ORed with the bits 4 through 
O of the internal interrupt register. 
NMI Non-maskable Input | This pin accepts the non-maskable interrupt 


signal. It is ORed with the bit 6 of the 
internal interrupt register. 


User's Manual U10504EJ7VOUM00 


57 


Chapter 2 


2.2.4 Joint Test Action Group (JTAG) Interface Signals 


These signals are for interfacing the boundary scan of JTAG. Table 2-4 shows the 
functions of these signals. 


Table 2-4 JTAG Interface Signals 


Signal Name Definition VO Function 
JTDI JTAG data input Input | Inputs data to be scanned serially. 
JTCK JTAG clock input Input | Inputs a serial clock. JTDI and JTMS are 


read simultaneously at the rising edge of 
this signal. 

Fix this signal to the low level when the 
JTAG interface is not used. 


JTDO JTAG data output Output | Outputs serially scanned data. 

JTMS JTAG command Input | Inputs a high level to this pin if the serial 
data to be input next is a command of the 
JTAG. 


2.2.5 Initialization Interface Signals 


These signals are used when the external device initializes the operation 
parameters of the processor. Table 2-5 shows the functions of these signals. 


Table 2-5 Initialization Interface Signals 


Signal Name Definition VO Function 


ColdReset Cold reset Input | Asserted active at cold reset. SClock and 
TClock start the cycle at the rising edge of 
this signal. This signal needs not be 
asserted active or deasserted inactive in 
synchronization with the MasterClock 
signal. 


Reset Reset Input | Make this pin active or inactive in 
synchronization with MasterClock, or keep 
it inactive at cold reset. 

Make this pin active or inactive in 
synchronization with MasterClock at soft 
reset. 
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CPU Instruction Set Summary 


This chapter is an overview of the central processing unit (CPU) instruction set; 
refer to Chapter 16 CPU Instruction Set Details for detailed descriptions of 
individual CPU instructions. 


Because the FPU instruction is dependent upon the structure of the coprocessor, 
refer to Chapter 7 Floating-Point Operations and Chapter 17 FPU Instruction 
Set Details. 
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3.1 CPU Instruction Formats 


Each CPU instruction consists of a single 32-bit word, aligned on a word 
boundary. There are three instruction formats—immediate (I-type), jump (J- 
type), and register (R-type)—as shown in Figure 3-1. By simplifying the 
instruction format in three ways, decoding instructions is simplified. Complicated 
and less frequently used operations and addressing modes are implemented by 
combining two or more instructions by using a compiler. 


I-Type (Immediate) 
31 26 25 2120 16 15 0 
op rs rt immediate 


J-Type (Jump) 
31 26 25 0 


oF target J 


R-Type (Register) 


31 2625 2120 1615 1110 6 5 0 
) rs rt rd sa__| funct 
op 6-bit operation code 
rs 5-bit source register number 


5-bit target (source/destination) register number or 


rt ws 
branch condition 
F ; 16-bit immediate value, branch displacement or 
immediate 3 
address displacement 
target 26-bit unconditional branch target address 
rd 5-bit destination register number 
sa 5-bit shift amount 
funct 6-bit function field 


Figure 3-1 CPU Instruction Formats 
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Support of the MIPS ISA 


Even though the Vp4300 processor does not support a multiprocessor operating 
environment, the synchronization support instructions defined in the MIPS II and 
MIPS III IsA—the Load Linked and Store Conditional instructions—are 
processed correctly, in order to maintain compatibility with Vp4400 and Vp4200. 
The load link bit (ZLbit) is set by the LL instruction, cleared by an ERET, and 
tested by the SC instruction. The only operation to the LLbit that can be 
implemented is a reset due to cache invalidation. 


Caution Note that all load/store instructions in this processor are executed 
in program order since the SYNC instruction is handled as a NOP. 


3.2 Instruction Classes 


The CPU instructions can be classified into six classes. 


3.2.1 Load/Store Instructions 


Load and store are immediate (I-type) instructions that move data between 
memory and the general purpose registers. Only a mode that adds a 16-bit signed 
immediate offset to the base register is available as the addressing mode of the 
load/store instructions. 


Scheduling a Load Delay Slot 


A load instruction whose loading result cannot be used by the instruction 
immediately following is called a delayed load instruction. The instruction slot 
immediately after a delayed load instruction is called a load delay slot. With the 
VR4000 Series, an instruction including the load destination register can be 
described immediately after a load instruction. In this case, however, the interlock 
count is generated equal to the number of necessary cycles. Therefore, although 
any instruction can be described, it is recommended to schedule the load delay slot 
to improve the performances of the Vp4300 and to maintain its compatibility with 
the Vp3000 Series (for details, refer to Chapter 4 Pipeline). 


Store Delay Slot 


In the Vp4300 processor, a store instruction writing to the data cache keeps the 
data cache busy during both its DC and WB stages. If the instruction immediately 
following needs to access the data cache in its DC stage (e.g. a load instruction), 
the hardware interlocks. Consequently, scheduling store delay slots can be 
desirable for performance. 
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Table 3-1 Number of Cycles for Load and Store Instruction Delay Slot 


Instruction PCycles Required 
Load 1 
Store 1 


Defining Access Types 


Access type is the size of the data loaded/stored by the processor. 


The op code of the load/store instruction determines the access type. Figure 3-2 
shows the access type and the data to be loaded/stored. The address used for the 
load/store instruction is the least significant byte address (most significant byte in 
big endian and the address indicating the least significant byte in little endian), 
regardless of the access type and byte ordering (endianness). 


The byte ordering in the doubleword of the data to be accessed is determined by 
the access type and the low-order 3 bits of the address, as shown in Figure 3-2. 
Combinations of an access type and the low-order bits of an address other than 
those shown in Figure 3-2 are prohibited. If a combination other than those shown 
in the figure is used, an address error exception occurs. 


Table 3-2 lists the load/store instructions defined by ISA, and Table 3-3 lists the 
instructions of the extended ISA. 
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Access-Type ea Bytes Accessed 
; ress Bits 
Mnemonic Big endian Little endian 
(Value) 2/1/0 0)| (63 0) 
Doubleword (7) 0/;0;]0 213/4/5/617/7/6/5/4/3);2] 1 
0/01] 0 213/4)/51]6 6/5/4/3/2)]1 
Septibyte (6) ae 
0/0; 1 213/4/5/617/7/6/5/4/3)2)1 
. 0/;01! 0 2)3)41]5 5}4;)3]/2)1)0 
Sextibyte (5) 
Oo; 140 213/4/5/617/7}6/5/4]3}2 
Giada: Oe 2/34 413]2 
uInt e 
y Glee a 3/415]617/71615]4]3 
0/0; 0 3 
Word (3) 
1/01] 0 
0/;01]0 
Triplebyte (2) ee 
riplebyte 
poy 1/01] 0 
1; 01] 1 
0/;01!0 
0; 140 
Halfword (/) 
1/01] 0 
1 1/0 
0/0; 0 
0/;0; 1 
Oo; 1) 0 
Byte (0) vel : 
e 
y 1/olo 
1;)}0/ 1 
1 1/0 
1 1 1 


Figure 3-2. Byte Access within 


a Doubleword 
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Table 3-2. Load/Store Instructions (1/2) 
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Instruction Format and Description| op | base rt offset | 

Load Byte LB rt, offset (base) 
Generates an address by adding a sign-extended offset to the contents of 
register base. 
Sign-extends the contents of a byte specified by the address and loads the 
result to register rt. 

Load Byte LBU rt, offset (base) 

Unsigned Generates an address by adding a sign-extended offset to the contents of 
register base. 
Zero-extends the contents of a byte specified by the address and loads the 
result to register rt. 

Load Halfword LH rt, offset (base) 
Generates an address by adding a sign-extended offset to the contents of 
register base. 
Sign-extends the contents of a halfword specified by the address and loads 
the result to register rt. 

Load Halfword LHU rt, offset (base) 

Unsigned Generates an address by adding a sign-extended offset to the contents of 
register base 
Zero-extends the contents of a halfword specified by the address and loads 
the result to register rt. 

Load Word LW rt, offset (base) 
Generates an address by adding a sign-extended offset to the contents of 
register base. 
Sign-extends the contents of a word specified by the address (in the 64-bit 
mode) and loads the result to register rt. 

Load Word Left | LWL rt, offset (base) 
Generates an address by adding a sign-extended offset to the contents of 
register base. 
Shifts a word specified by the address to the left, so that a byte specified by 
the address is at the leftmost position of the word. Sign-extends (in the 64- 
bit mode), merges the result of the shift and the contents of register rt, and 
loads the result to register rt. 

Load Word Right | LWR rt, offset (base) 
Generates an address by adding a sign-extended offset to the contents of 
register base. 
Shifts a word specified by the address to the right, so that a byte specified by 
the address is at the rightmost position of the word. Sign-extends (in the 64- 
bit mode), merges the result of the shift and the contents of register rt, and 
loads the result to register rt. 
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Table 3-2. Load/Store Instructions (2/2) 


Instruction 


Format and Description; op | base rt offset | 


Store Byte 


SB rt, offset (base) 

Generates an address by adding a sign-extended offset to the contents of 
register base. 

Stores the contents of the low-order byte of register rt to the memory 
specified by the address. 


Store Halfword 


SH rt, offset (base) 

Generates an address by adding a sign-extended offset to the contents of 
register base. 

Stores the contents of the low-order halfword of register rt to the memory 
specified by the address. 


Store Word 


SW rt, offset (base) 

Generates an address by adding a sign-extended offset to the contents of 
register base. 

Stores the contents of the low-order word of register rt to the memory 
specified by the address. 


Store Word Left 


SWL rt, offset (base) 

Generates an address by adding a sign-extended offset to the contents of 
register base. 

Shifts the contents of register rt to the right so that the leftmost byte of the 
word is at the position of the byte specified by the address. Stores the result 
of the shift to the lower portion of the word in memory. 


Store Word Right 


SWR rt, offset (base) 

Generates an address by adding a sign-extended offset to the contents of 
register base. 

Shifts the contents of register rt to the left so that the rightmost byte of the 
word is at the position of the byte specified by the address. Stores the result 
of the shift to the higher portion of the word in memory. 
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Table 3-3 Load/Store Instructions (Extended ISA) (1/2) 
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Instruction Format and Description| op | base rt offset | 

Load Doubleword | LD rt, offset (base) 
Generates an address by adding a sign-extended offset to the contents of 
register base. 
Loads the contents of the doubleword specified by the address to register rt. 

Load Doubleword | LDL rt, offset (base) 

Left Generates an address by adding a sign-extended offset to the contents of 
register base. 
Shifts the doubleword specified by the address to the left so that the byte 
specified by the address is at the leftmost position of the doubleword. 
Merges the result of the shift and the contents of register rt, and loads the 
result to register rt. 

Load Doubleword | LDR rt, offset (base) 

Right Generates an address by adding a sign-extended offset to the contents of 
register base. 
Shifts the doubleword specified by the address to the right so that the byte 
specified by the address is at the rightmost position of the doubleword. 
Merges the result of the shift and the contents of register rt, and loads the 
result to register rt. 

Load Linked LL rt, offset (base) 
Generates an address by adding a sign-extended offset to the contents of 
register base. 
Loads the contents of the word specified by the address to register rt nd sets 
the LL bit to 1. 

Load Linked LLD rt, offset (base) 

Doubleword Generates an address by adding a sign-extended offset to the contents of 
register base. 
Loads the contents of the doubleword specified by the address to register rt 
and sets the LL bit to 1. 

Load Word LWU rt, offset (base) 

Unsigned Generates an address by adding a sign-extended offset to the contents of 
register base. 
Zero-extends the contents of the word specified by the address, and loads the 
result to register rt. 
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Table 3-3 Load/Store Instructions (Extended ISA) (2/2) 


Instruction 


base | __rt offset | 


Format and Description 


op 


Store Conditional 


SC rt, offset (base) 

Generates an address by adding a sign-extended offset to the contents of 
register base. 

If the LL bit is 1, stores the contents of the low-order word of register rt to 
the memory specified by the address, and sets register rt to 1. 

If the LL bit is 0, does not store the contents of the word, and clears register 
rt to 0. 


Store Conditional 
Doubleword 


SCD rt, offset (base) 

Generates an address by adding a sign-extended offset to the contents of 
register base. 

If the LL bit is 1, stores the contents of register rt to the memory specified by 
the address, and sets register rt to 1. 

If the LL bit is 0, does not store the contents of the register, and clears register 
rt to 0. 


Store Doubleword 


SD rt, offset (base) 

Generates an address by adding a sign-extended offset to the contents of 
register base. 

Stores the contents of register rt to the memory specified by the address. 


Store Doubleword 
Left 


SDL rt, offset (base) 

Generates an address by adding a sign-extended offset to the contents of 
register base. 

Shifts the contents of register rt to the right so that the leftmost byte of a 
doubleword is at the position of the byte specified by the address. Stores the 
result of the shift to the lower portion of the doubleword in memory. 


Store 
Doubleword 
Right 


SDR ff, offset (base) 

Generates an address by adding a sign-extended offset to the contents of 
register base. 

Shifts the contents of register rt to the left so that the rightmost byte of a 
doubleword is at the position of the byte specified by the address. Stores the 
result of the shift to the higher portion of the doubleword in memory. 
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3.2.2 Computational Instructions 
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Computational instructions executes arithmetic operations, multiply/divide, 
logical operations, and shift operations on the values of registers. These 
instructions are classified into two types: R-type and I-type. The R-type 
instructions uses registers as both the source, and the I-type instructions uses an 
immediate value as one of the sources. The operation instructions are divided into 
the following four types by classification of operation. 

(1) ALU immediate instructions (Refer to Tables 3-4 and 3-5.) 

(2) 3-operand type instructions (Refer to Tables 3-6 and 3-7.) 

(3) Shift instructions (Refer to Tables 3-8 and 3-9.) 


(4) Multiply/Divide instructions (Refer to Tables 3-10 and 3-11.) 


If compatibility of data is necessary in the 64-bit and 32-bit modes, the 32-bit 
operands must be correctly sign-extended. Otherwise, the 32-bit value of the 
result of the operation will be meaningless. 
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Table 3-4 ALU Immediate Instructions 


Instruction Format and Description| op rs rt immediate | 
Add Immediate ADDI rt, rs, immediate 
Sign-extends the 16-bit immediate and adds it to register rs. Stores the 
32-bit result to register rt (sign-extends the result in the 64-bit mode). 
Generates an exception if a 2's complement integer overflow occurs. 
Add Immediate | ADDIU rt, rs, immediate 
Unsigned Sign-extends the 16-bit immediate and adds it to register rs. Stores the 32-bit 


result to register rt (sign-extends the result in the 64-bit mode). Does not 
generate an exception even if an integer overflow occurs. 


Set On Less Than 
Immediate 


SLTI rt, rs, immediate 

Sign-extends the 16-bit immediate and compares it with register rs as a 
signed integer. If rs is less than the immediate, stores | to register rt; 
otherwise, stores 0 to register rt. 


Set On Less Than 
Immediate 
Unsigned 


SLTIU rt, rs, immediate 

Sign-extends the 16-bit immediate and compares it with register rs as an 
unsigned integer. If rs is less than the immediate, stores | to register rt; 
otherwise, stores 0 to register rt. 


And Immediate 


ANDI rt, rs, immediate 
Zero-extends the 16-bit immediate, ANDs it with register rs, and stores the 
result to register rt. 


Or Immediate 


ORI rt, rs, immediate 
Zero-extends the 16-bit immediate, ORs it with register rs, and stores the 
result to register rt. 


Exclusive Or 


XORI rt, rs, immediate 


Immediate Zero-extends the 16-bit immediate, exclusive-ORs it with register rs, and 
stores the result to register rt. 

Load Upper LUI rt, immediate 

Immediate Shifts the 16-bit immediate 16 bits to the left, and clears the low-order 16 bits 


of the word to 0. 
Stores the result to register rt (by sign-extending the result in the 64-bit 
mode). 
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Table 3-5 ALU Immediate Instruction (Extended ISA) 


Instruction Format and Description| op rs rt immediate | 

Doubleword Add | DADDI rt, rs, immediate 

Immediate Sign-extends the 16-bit immediate to 64 bits, and adds it to register rs. Stores 
the 64-bit result to register rt. Generates an exception if an integer overflow 
occurs. 

Doubleword Add | DADDIU rt, rs immediate 

Immediate Sign-extends the 16-bit immediate to 64 bits, and adds it to register rs. Stores 

Unsigned the 64-bit result to register rt. Does not generate an exception even if an 


integer overflow occurs. 
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Table 3-6 Three-Operand Type Instruction 


Instruction Format and Description| op rs rt rd sa_| funct | 
Add ADD td, rs, rt 
Adds the contents of register rs and rt, and stores (sign-extends in the 64-bit 
mode) the 32-bit result to register rd. 
Generates an exception if an integer overflow occurs. 
Add Unsigned ADDU rd, rs, rt 
Adds the contents of register rs and rt, and stores (sign-extends in the 64-bit 
mode) the 32-bit result to register rd. 
Does not generate an exception even if an integer overflow occurs. 
Subtract SUB rd, rs, rt 
Subtracts the contents of register rs from register rt, and stores (sign-extends 
in the 64-bit mode) the result to register rd. 
Generates an exception if an integer overflow occurs. 
Subtract SUBU rd, rs, rt 
Unsigned Subtracts the contents of register rt from register rs, and stores (sign-extends 


in the 64-bit mode) the 32-bit result to register rd. 
Does not generate an exception even if an integer overflow occurs. 


Set On Less Than 


SLT rd, rs, rt 

Compares the contents of registers rs and rt as signed integers. 

If the contents of register rs are less than those of rt, stores | to register rd; 
otherwise, stores 0 to rd. 


Set On Less Than 
Unsigned 


SLTU rd, rs, rt 

Compares the contents of registers rs and rt as unsigned integers. 

If the contents of register rs are less than those of rt, stores | to register rd; 
otherwise, stores 0 to rd. 


And 


AND rd, rs, rt 
ANDs the contents of registers rs and rt in bit units, and stores the result to 
register rd. 


OR rd, rs, rt 
ORs the contents of registers rs and rt in bit units, and stores the result to 
register rd. 


Exclusive Or 


XOR rd, rs, rt 
Exclusive-ORs the contents of registers rs and rt in bit units, and stores the 
result to register rd. 


Nor 


NOR rd, rs, rt 
NORs the contents of registers rs and rt in bit units, and stores the result to 
register rd. 
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Table 3-7 Three-Operand Type Instructions (Extended ISA) 
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Instruction Format and Description| op rs rt rd sa_| funct | 
Doubleword Add | DADD rd, rs, rt 
Adds the contents of registers rs and rt, and stores the 64-bit result to register 
rd. 
Generates an exception if an integer overflow occurs. 
Doubleword Add | DADDU rd, rs, rt 
Unsigned Adds the contents of registers rs and rt, and stores the 64-bit result to register 
rd. 
Does not generate an exception even if an integer overflow occurs. 
Doubleword DSUB rd, rs, rt 
Subtract Subtracts the contents of register rt from register rs, and stores the 64-bit 
result to register rd. 
Generates an exception if an integer overflow occurs. 
Doubleword DSUBU rd, rs, rt 
Subtract Unsigned | Subtracts the contents of register rt from register rs, and stores the 64-bit 
result to register rd. 
Does not generate an exception even if an integer overflow occurs. 
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Table 3-8 Shift Instructions 


Instruction Format and Description| _op rs rt rd sa_| funct | 
Shift Left Logical | SLL rd, rt, sa 
Shifts the contents of register rt sa bits to the left, and inserts 0 to the low- 
order bits. 
Sign-extends (in the 64-bit mode) the 32-bit result and stores it to register rd. 
Shift Right SRL rd, rt, sa 
Logical Shifts the contents of register rt sa bits to the right, and inserts 0 to the high- 
order bits. 
Sign-extends (in the 64-bit mode) the 32-bit result and stores it to register rd. 
Shift Right SRA rd, rt, sa 
Arithmetic Shifts the contents of register rt sa bits to the right, and sign-extends the high- 
order bits. 
Sign-extends (in the 64-bit mode) the 32-bit result and stores it to register rd. 
Shift Left Logical | SLLV rd, rt, rs 
Variable Shifts the contents of register rt to the left and inserts 0 to the low-order bits. 
The number of bits by which the register contents are to be shifted is 
specified by the low-order 5 bits of register rs. 
Sign-extends (in the 64-bit mode) the result and stores it to register rd. 
Shift Right SRLYV rd, rt, rs 
Logical Variable | Shifts the contents of register rt to the right, and inserts 0 to the high-order 
bits. 
The number of bits by which the register contents are to be shifted is 
specified by the low-order 5 bits of register rs. 
Sign-extends (in the 64-bit mode) the 32-bit result and stores it to register rd. 
Shift Right SRAV rd, rt, rs 
Arithmetic Shifts the contents of register rt to the right and sign-extends the high-order 
Variable bits. 


The number of bits by which the register contents are to be shifted is 
specified by the low-order 5 bits of register rs. 
Sign-extends (in the 64-bit mode) the 32-bit result and stores it to register rd. 
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Table 3-9 Shift Instructions (Extended ISA) (1/2) 


Instruction Format and Description| _op rs rt rd sa_| funct | 
Doubleword Shift | DSLL rd, rt, sa 
Left Logical Shifts the contents of register rt sa bits to the left, and inserts 0 to the low- 
order bits. 


Stores the 64-bit result to register rd. 


Doubleword Shift | DSRL rd, rt, sa 

Right Logical Shifts the contents of register rt sa bits to the right, and inserts 0 to the high- 
order bits. 

Stores the 64-bit result to register rd. 


Doubleword Shift | DSRA rd, rt, sa 

Right Arithmetic | Shifts the contents of register rt sa bits to the right, and sign-extends the high- 
order bits. 

Stores the 64-bit result to register rd. 


Doubleword Shift | DSLLV rd, rt, rs 

Left Logical Shifts the contents of register rt to the left, and inserts 0 to the low-order bits. 
Variable The number of bits by which the register contents are to be shifted is 
specified by the low-order 6 bits of register rs. 

Stores the 64-bit result and stores it to register rd. 


Doubleword Shift | DSRLV 1d, rt, rs 

Right Logical Shifts the contents of register rt to the right, and inserts 0 to the higher bits. 
Variable The number of bits by which the register contents are to be shifted is 
specified by the low-order 6 bits of register rs. 

Sign-extends the 64-bit result and stores it to register rd. 


Doubleword Shift | DSRAV rd, rt, rs 

Right Arithmetic | Shifts the contents of register rt to the right, and sign-extends the high-order 
Variable bits. 

The number of bits by which the register contents are to be shifted is 
specified by the low-order 6 bits of register rs. 

Sign-extends the 64-bit result and stores it to register rd. 


Doubleword Shift | DSLL32 rd, rt, sa 

Left Logical + 32 | Shifts the contents of register rt 32+sa bits to the left, and inserts 0 to the low- 
order bits. 

Stores the 64-bit result to register rd. 


Doubleword shift | DSRL82 rd, rt, sa 

Right Logical Shifts the contents of register rt 32+sa bits to the right, and inserts 0 to the 
+ 32 high-order bits. 

Stores the 64-bit result to register rd. 
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Table 3-9 Shift Instructions (Extended ISA) (2/2) 


Instruction Format and Description| op rs rt rd sa_| funct | 

Doubleword Shift | DSRA382 rd, rt, sa 

Right Arithmetic | Shifts the contents of register rt 32+sa bits to the right, and sign-extends the 

+32 high-order bits. 
Stores the 64-bit result to register rd. 

Table 3-10 Multiply/Divide Instructions 
Instruction Format and Description| op rs rt rd sa_| funct | 

Multiply MULT rs, rt 
Multiplies the contents of register rs by the contents of register rt as a 32-bit 
signed integer. Sign-extends (in the 64-bit mode) and stores the 64-bit result 
to special registers HI and LO. 

Multiply MULTU rs, rt 

Unsigned Multiplies the contents of register rs by the contents of register rt as a 32-bit 
unsigned integer. Sign-extends (in the 64-bit mode) and stores the 64-bit 
result to special registers HI and LO. 

Divide DIV rs, rt 
Divides the contents of register rs by the contents of register rt. The operand 
is treated as a 32-bit signed integer. Sign-extends (in the 64-bit mode) and 
stores the 32-bit quotient to special register LO and the 32-bit remainder to 
special register HI. 

Divide Unsigned | DIVU rs, rt 
Divides the contents of register rs by the contents of register rt. The operand 
is treated as a 32-bit unsigned integer. Sign-extends (in the 64-bit mode) and 
stores the 32-bit quotient to special register LO and the 32-bit remainder to 
special register HI. 

Move From HI MFHI rd 
Transfers the contents of special register HI to register rd. 

Move From LO MFLO rd 
Transfers the contents of special register LO to register rd. 

Move To HI MTHI rs 
Transfers the contents of register rs to special register HI. 

Move To LO MTLO rs 
Transfers the contents of register rs to special register LO. 
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Table 3-11 Multiply/Divide Instructions (Extended ISA) 


Instruction Format and Description| op rs rt rd sa_| funct | 
Doubleword DMULT rs, rt 
Multiply Multiplies the contents of register rs by the contents of register rt as a signed 
integer. 
Stores the 128-bit result to special registers HI and LO. 
Doubleword DMULTU rs, rt 
Multiply Multiplies the contents of register rs by the contents of register rt as an 
Unsigned unsigned integer. 
Stores the 128-bit result to special registers HI and LO. 
Doubleword DDIV rs, rt 
Divide Divides the contents of register rs by the contents of register rt. 


The operand is treated as a signed integer. 
Stores the 64-bit quotient to special register LO, and the 64-bit remainder to 
special register HI. 


Doubleword DDIVU rs, rt 

Divide Unsigned | Divides the contents of register rs by the contents of register rt. 

The operand is treated as an unsigned integer. 

Stores the 64-bit quotient to special register LO, and the 64-bit remainder to 
special register HI. 


When an integer multiply or divide instruction is executed, the Vp4300 stalls the 
entire pipeline. The number of processor cycles (PCycles) stalled at this time is 
shown below. 


Table 3-12, Number of Cycles Stalled by Multiply/Divide Instruction 


Instruction | MULT | MULTU| DIV DIVU DMULT | DMULTU | DDIV | DDIVU 


Number of 


required 


76 User’s Manual U10504EJ7VOUMOO 


CPU Instruction Set Summary 


3.2.3. Jump/Branch Instructions 


The jump and branch instructions change the flow of the program. All the jump 
and branch instructions generate one delay slot. The instruction immediately 
following a jump or branch instruction (i.e., the instruction in the delay slot) is 
executed while the first instruction at the destination is fetched from the memory. 


Instructions involving link, such as JAL and BLTZAL, store the return address to 
register r31. 


Table 3-13 Number of Delay Slot Cycles of Jump/Branch Instruction 


Instruction Number of Required Cycles 
Branch 1 
Jump 1 


Outline of Jump Instruction 


Subroutine call described in a high-level language usually uses J or JAL 
instruction. The J and JAL instructions are J-type instructions. An instruction of 
this type shifts a 26-bit target address 2 bits to the left and combines it with the 
high-order 4 bits of the current program counter to generate a 32- or 64-bit 
absolute address. 


To return, dispatch, or jump between pages, the JR or JALR instruction is usually 
used. Both of these instructions are of R-type and references the 32- or 64-bit byte 
address of a general purpose register. 


For details, refer to Chapter 16 CPU Instruction Set Details. 


Outline of Branch Instruction 


The branch instruction has a signed 16-bit offset relative to the program counter. 
Instructions involving link, such as JAL and BLTZAL, store the return address to 
register r31. 


Table 3-14 lists the jump instructions, and Table 3-15 shows the branch 
instructions. Table 3-16 lists the branch instructions of the extended ISA. 
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Table 3-14 Jump Instructions 


Instruction Format and Description| op target | 
Jump J target 
Shifts the 26-bit target address 2 bits to the left, and jumps to the address 
coupled with the high-order 4 bits of the PC, delayed by one instruction. 
Jump And Link JAL target 


Shifts the 26-bit target address 2 bits to the left, and jumps to the address 
coupled with the high-order 4 bits of the PC, delayed by one instruction. 
Stores the address of the instruction following the delay slot to r31 (link 
register). 


Instruction Format and Description| op rs rt rd sa funct | 
Jump Register JR rs 
Jumps to the address of register rs, delayed by one instruction. 
Jump And Link JALR rs, rd 
Register Jumps to the address of register rs, delayed by one instruction. 


Stores the address of the instruction following the delay slot to register rd. 


The following common limits are applied to Tables 3-15 and 3-16. 


Branch Address 


The branch addresses of all the branch instructions are calculated by adding a 16- 
bit offset (signed 64 bits shifted 2 bits to the left) to the address of the instruction 
in the delay slot. All the branch instructions generate one delay slot. 


Operation during No Branch (Table 3-16) 
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If the branch condition of the branch likely instruction is not satisfied, the 
instruction in the delay slot is invalidated. The instruction in the delay slot are 
unconditionally executed for all the other branch instructions. 


Remark The instruction at the branch destination is fetched in the EX stage of 
the branch instruction. Comparison of branch and calculation of the 
target address are executed in phase 2 of the RF stage and phase 1 of 
the EX stage of the branch instruction. One cycle of the branch delay 
slot defined by the architecture is necessary. One cycle of the delay slot 
is also necessary for the jump instruction. If the branch condition of 
the branch likely instruction is not satisfied, the instruction in the 
branch slot are invalidated. 
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The following symbols in the instruction format in Table 3-15 through Table 3-21 
are special. 


REGIMM: op code 


Greater Than 
Zero 


Sub sub operation code 

CO : sub operation identifier 

BC : BC sub operation code 

br : branch condition identifier 

cofun : coprocessor function area 

op : operation code 

Table 3-15 Branch Instructions 
Instruction Format and Description| op rs rt offset 
Branch On Equal | BEQ ‘rs, rt, offset 
Branches to the branch address if register rs equals to rt. 

Branch On Not BNE rs, rt, offset 
Equal Branches to the branch address if register rs is not equal to rt. 
Branch On Less BLEZ rs, offset 
Than Or Equal To | Branches to the branch address if register rs is less than 0. 
Zero 
Branch On BGTZ rs, offset 


Branches to the branch address if register rs is greater than 0. 


Instruction Format and Description |REGIMM| __rs sub offset 
Branch On Less BLTZ rs, offset 
Than Zero Branches to the branch address if register rs is less than 0. 
Branch On BGEZ rs, offset 
Greater Than Or | Branches to the branch address if register rs is greater than 0. 
Equal To Zero 
Branch On Less BLTZAL rs, offset 
Than Zero And Stores the address of the instruction following the delay slot to register r31 
Link (link register), and branches to the branch address if register rs is less than 0. 
Branch On BGEZAL rs, offset 
Greater Than Or | Stores the address of the instruction following the delay slot to register r31 
Equal To Zero (link register) and branches to the branch address if register rs is greater than 
And Link 0. 
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Table 3-16 Branch Instructions (Extended ISA) 


Instruction Format and Description| op rs rt offset | 

Branch On Equal | BEQL rs, rt, offset 

Likely Branches to the branch address if registers rs and rt are equal. If the branch 
condition is not satisfied, the instruction in the branch delay slot is discarded. 

Branch On Not BNEL rs, rt, offset 

Equal Likely Branches to the branch address if registers rs and rt are not equal. If the 
branch condition is not satisfied, the instruction in the branch delay slot is 
discarded. 

Branch On Less BLEZL rs, offset 

Than Or Equal To | Branches to the branch address if register rs is less than 0. If the branch 

Zero Likely condition is not satisfied, the instruction in the branch delay slot is discarded. 

Branch On BGTZL rs, offset 

Greater Than Branches to the branch address if register rs is greater than 0. If the branch 

Zero Likely condition is not satisfied, the instruction in the branch delay slot is discarded. 
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Instruction Format and Description |REGIMM| __rs sub offset | 

Branch On Less BLTZL rs, offset 

Than Zero Likely | Branches to the branch address if register rs is less than 0. If the branch 
condition is not satisfied, the instruction in the branch delay slot is discarded. 

Branch On BGEZL rs, offset 

Greater Than Or | Branches to the branch address if register rs is greater than 0. If the branch 

Equal To Zero condition is not satisfied, the instruction in the branch delay slot is discarded. 

Likely 

Branch On Less BLTZALL rs, offset 

Than Zero And Stores the address of the instruction following the delay slot to register r31 

Link Likely (link register). Branches to the branch address if register rs is less than 0. If 
the branch condition is not satisfied, the instruction in the branch delay slot 
is discarded. 

Branch On BGEZALL rs, offset 

Greater Than Or | Stores the address of the instruction following the delay slot to register r31 

Equal To Zero (link register). Branches to the branch address if register rs is greater than 0. 

And Link Likely | If the branch condition is not satisfied, the instruction in the branch delay slot 


is discarded. 
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3.2.4 Special Instructions 


The special instructions generate an exception by software. The instruction type 
is R-type (Syscall, Break). The trap instructions are invalid with the Vp3000 
Series. All the other instructions are valid with all the Vp Series. 


Table 3-17 Special Instructions 


Instruction Format and Description |sPEcIAL| __rs rt rd sa__| funct | 


Synchronize SYNC 
Completes the load/store instruction currently in the pipeline before the new 
load/store instruction is executed. 


System Call SYSCALL 
Generates a system call exception and transfers control to the exception 
processing program. 


Breakpoint BREAK 
Generates a breakpoint exception and transfers control to the exception 
processing program. 


Table 3-18 Special Instructions (Extended ISA) (1/2) 


Instruction Format and Description |SPECIAL| _rs rt rd sa funct 


Trap If Greater TGE rs, rt 
Than Or Equal Compares registers rs and rt as signed integers. If register rs is greater than 
rt, generates an exception. 


Trap If Greater TGEU rs, rt 
Than Or Equal Compares registers rs and rt as unsigned integers. If register rs is greater than 
Unsigned rt, generates an exception. 


Trap If Less Than | TLT rs, rt 
Compares registers rs and rt as signed integers. If register rs is less than rt, 
generates an exception. 


Trap If Less Than | TLTU rs, rt 


Unsigned Compares registers rs and rt as unsigned integers. If register rs is less than rt, 
generates an exception. 
Trap If Equal TEQ rs, rt 


Generates an exception if registers rs and rt are equal. 


Trap If Not Equal | TNE rs, rt 
Generates an exception if registers rs and rt are not equal. 
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Table 3-18 Special Instructions (Extended ISA) (2/2) 
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Instruction Format and Description |REGIMM| ___rs sub immediate | 

Trap If Greater TGEI rs, immediate 

Than Or Equal Compares the contents of register rs with 16-bit sign-extended immediate as 

Immediate signed integer. If rs contents are greater than the immediate, generates an 
exception. 

Trap If Greater TGEIU rs, immediate 

Than Or Equal Compares the contents of register rs with 16-bit zero-extended immediate as 

Immediate unsigned integer. If rs contents are greater than the immediate, generates an 

Unsigned exception. 

Trap If Less Than | TLTI rs, immediate 

Immediate Compares the contents of register rs with 16-bit sign-extended immediate as 
signed integer. If rs contents are less than the immediate, generates an 
exception. 

Trap If Less Than | TLTIU rs, immediate 

Immediate Compares the contents of register rs with 16-bit zero-extended immediate as 

Unsigned unsigned integer. If rs contents are less than the immediate, generates an 
exception. 

Trap If Equal TEQI rs, immediate 

Immediate Generates an exception if the contents of register rs are equal to immediate. 

Trap If Not Equal | TNEI rs, immediate 

Immediate Generates an exception if the contents of register rs are equal to immediate. 
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3.2.5 Coprocessor Instructions 


The coprocessor instructions are used to operate each coprocessor. The 


coprocessor load and store instructions are I-type. The format of the operation 


instruction of each coprocessor differs. Table 3-19 shows the coprocessor 
instructions valid for all the Vp Series. Table 3-20 lists the coprocessor 
instructions valid only with the Vp4000 which is defined as extended ISA. 


Instruction 


Table 3-19 Coprocessor Instructions (1/2) 


offset 


base rt 


Format and Description| op 


Load Word To 
Coprocessor z 


Store Word From 
Coprocessor z 


LWCz rt, offset (base) 

Sign-extends and adds offset to register base to generate an address. 
Loads the contents of the word specified by the address to the general 
purpose register rt of coprocessor z. 


SWCz rt, offset (base) 

Sign-extends and adds offset to register base to generate an address. 

Stores the contents of the general purpose register rt of coprocessor z to the 
memory position specified by the address. 


Instruction 


Format and Description | COPz | sub rt rd 0 


Move To 
Coprocessor z 


MTCz rt, rd 
Transfers the contents of CPU register rt to the general purpose register rd of 
coprocessor Z. 


Move From 
Coprocessor z 


Move Control To 
Coprocessor z 


MFCz rt, rd 

Transfers the contents of the general purpose register rd of coprocessor z to 
CPU register rt. 

CTCz rt, rd 

Transfers the contents of CPU register rt to the coprocessor control register 
rd of coprocessor z. 


Move Control 
From 
Coprocessor z 


Instruction 


CFCz rt, rd 
Transfers the contents of the coprocessor control register rd of coprocessor z 
to CPU register rt. 


COPz | CO cofun 


Format and Description 


Coprocessor z 
Operation 


COPz cofun 
Coprocessor z executes an operation defined for each coprocessor. 
The status of the CPU is not changed by the operation of the coprocessor. 
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Table 3-19 Coprocessor Instructions (2/2) 


Instruction 


offset | 


Format and Description| COPz | BC br 


Branch On 
Coprocessor z 
True 


BCzT offset 

Shifts the 16-bit offset 2 bits to the left and sign-extends it to 32 bits. Adds 
the result to the address of the instruction in the delay slot to calculate the 
branch address. 

If the condition signal of coprocessor z is true, branches to the branch 
address, delayed by one instruction. 


Branch On 
Coprocessor z 
False 


BCzF offset 

Shifts the 16-bit offset 2 bits to the left and sign-extends it to 32 bits. Adds 
the result to the address of the instruction in the delay slot to calculate the 
branch address. 

If the condition signal of coprocessor z is false, branches to the branch 
address, delayed by one instruction. 


Table 3-20 Coprocessor Instructions (Extended ISA) (1/2) 


Coprocessor z 


Instruction _| Format and Description | COPz | sub rt rd 0 | 
Doubleword DMTCz rt, rd 
Move To Transfers the contents of the general purpose register rt of the CPU to the 
Coprocessor z general purpose register rd of coprocessor z. 
Doubleword DMFCz rt, rd 
Move From Transfers the contents of the general purpose register rd of coprocessor z to 


the general purpose register rt of the CPU. 


Coprocessor z 


Instruction Format and Description| op base rt offset 
Load LDCz rt, offset (base) 
Doubleword To Sign-extends and adds offset to register base to generate an address. 


Loads the contents of the doubleword specified by the address to the general 
purpose register (rt if FR = 1 and rt and rt+1 if FR = 0) of coprocessor z. 


Store 
Doubleword 
From 
Coprocessor z 
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SDCz rt, offset (base) 

Sign-extends and adds offset to register base to generate an address. 
Stores the contents of the doubleword of the general purpose register 
(rt if FR = 1 and rt and rt+1 if FR = 0) of coprocessor z to the memory 
position specified by the address. 
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Table 3-20 Coprocessor Instructions (Extended ISA) (2/2) 


Instruction Format and Description| COPz | BC br offset | 
Branch On BCzTL offset 
Coprocessor z Shifts the 16-bit offset 2 bits to the left and sign-extends it. Adds the result 
True Likely to the address of the instruction in the delay slot to calculate the branch 
address. 


If the condition signal of coprocessor z is true, branches to the branch 
address, delayed by one instruction. 

If the branch condition is not satisfied, the instruction in the branch delay slot 
is discarded. 


Branch On BCzFL offset 

Coprocessor z Shifts the 16-bit offset 2 bits to the left and sign-extends it. Adds the result 

False Likely to the address of the instruction in the delay slot to calculate the branch 
address. 


If the condition signal of coprocessor z is false, branches to the branch 
address, delayed by one instruction. 

If the branch condition is not satisfied, the instruction in the branch delay slot 
is discarded. 
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3.2.6 System Control Coprocessor (CPO) Instructions 


The system control coprocessor (CPO) instructions execute operations to the CPO 
register to control the memory of the processor and to perform exception 
processing. 


Table 3-21 System Control Coprocessor (CPO) Instructions (1/2) 


Instruction 


COPO | sub rt rd 0 | 


Format and Description 


Move To System 
Control 


MTCO rt, rd 
Loads the contents of the word of the general purpose register rt of the CPU 


Move To System 
Control 


Coprocessor to the general purpose register rd of CPO. 

Move From MFCO rt, rd 

System Control Loads the contents of the word of the general purpose register rd of CPO to 
Coprocessor the general purpose register rt of the CPU. 

Doubleword DMTCO rt, rd 


Loads the contents of the doubleword of the general purpose register rt of the 
CPU to the general purpose register rd of CPO. 


Write Random 
TLB Entry 


Coprocessor 
Doubleword DMEFCO rt, rd 
Move From Loads the contents of the doubleword of the general purpose register rd of 
System Control CPO to the general purpose register rt of the CPU. 
Coprocessor 
Instruction Format and Description| COPO | CO funct 
Read Indexed TLBR 
TLB Entry Loads the TLB entry indicated by the index register to the entry Hi, entry 
Lo0, entry Lol, and page mask registers. 
Write Indexed TLBWI 
TLB Entry Loads the contents of the entry Hi, entry Lo0, entry Lol, and page mask 


registers to the TLB entry indicated by the index register. 


TLBWR 
Loads the contents of the entry Hi, entry Lo0, entry Lol, and page mask 
registers to the TLB entry indicated by the random register. 


Probe TLB For 
Matching Entry 


TLBP 
Loads the address of the TLB entry coinciding with the contents of the entry 
Hi register to the index register. 


Return From 
Exception 
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ERET 


Returns from an exception, interrupt, or error trap. 


User's Manual U10504EJ7VOUM00 


CPU Instruction Set Summary 


Table 3-21 System Control Coprocessor (CPO) Instructions (2/2) 


Instruction Format and Description |CACHE| base | op offset | 


Cache Operation | Cache op, offset (base) 

Sign-extends the 16-bit offset to 32 bits and adds it to register base to 
generate a virtual address. The virtual address is converted into a physical 
address by using the TLB, and a cache operation indicated by a 5-bit sub op 
code is executed to that address. 
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This chapter describes the operation of the Vp4300 processor pipeline. 
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4.1 General 


The Vp4300 uses a 5-stage pipeline. The pipeline is usually controlled by the 
pipeline clock that is determined by the value of the DivMode(1:0)* pins. This 
pipeline clock is called PClock and one cycle of it is called PCycle. Each stage of 
the pipeline is executed in 1 PCycle. The PCycle has two stages, ®1 and ®2, as 
shown in Figure 4-1. Therefore, at least 5 PCycles are required to execute an 
instruction. If the necessary data is not in the cache and must be fetched from the 
main memory, more cycles are necessary. When the pipeline flows smoothly, five 
instructions are executed simultaneously. 


* In Vp4300 and Vp4305. In Vp4310, DivMode(2:0). 


| MasterClock Cycle | PCycle | 


ot ne ee a Ga I eC 


phase | ®1| ©2| o1| 2 | o1| 02] o1| 02] o1| 22 | 


Cycle IC 


RF EX DC WB 


Figure 4-1 Pipeline Stages 


The five pipeline stages are: 
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IC - Instruction Cache Fetch 
RF - Register Fetch 

EX - Execution 

DC - Data Cache Fetch 

WB - Write Back 
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Figure 4-2 outlines the pipeline. The horizontal rows in this figure indicate the 
execution processes of instructions, and the vertical columns indicate the five 
processes executed at the same time. 


(5-Deep) 


IC 


RF EX DG WB 


IC RF EX DC WB Jj 
IC RF EX DC WB | 
IC RF EX DC WB | 


IC RF EX DC WB 


Current 
CPU 
Cycle 


Figure 4-2 Instruction Execution in the Pipeline 
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4.1.1 Pipeline Operations 


Figure 4-3 shows the operations that can occur during each pipeline stage; Table 
4-1 describes these pipeline activities. 


| PCycle | 


poor \ ff  \_f \_f \ S/S \LS 


phase | ®1| ©2| o1| 2 | o1| o2| o1| 2| 01] @2 | 


Cycle IC RF EX DC WB 
ICF 
Instr Fetch 
ITLB ITC 
‘ RFR 
Computational Guus 
IDEC ALU 
DVA 
Load/Store DCR LA | RFW 
DTLB DTC DCW 
Branch IVA 


Figure 4-3 Pipeline Operations 
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Table 4-1 Description of Pipeline Showing Stage in Which Operations Commence 


Begins During 


Cycle this Phase Mnemonic Descriptions 
v1 — — 
IC ae ICF Instruction Cache Fetch 
ITLB Instruction micro-TLB read 
?1 ITC Instruction cache Tag Check 
RE RFR Register File Read 
o2 IDEC Instruction DECode 
IVA Instruction Virtual Address calculation 
BCMP Branch Compare 
EX ?1 ALU Arithmetic Logic operation 
DVA Data Virtual Address calculation 
DCR Data Cache Read 
DC a DTLB Data joint-TLB read 
LA Load data Alignment 
ae DTC Data cache Tag Check 
DCW Data Cache Write 
WB a RFW Register File Write 
o2 — — 
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4.2 Branch Delay 


The pipeline of the Vp4300 generates a branch delay of one cycle in the following 
cases: 


e When a target address is calculated with a jump instruction 


¢ When the branch condition of a branch instruction is satisfied and a 
target address is calculated 


The instruction address generated in the EX stage of a jump/branch instruction 
cannot be used until the IC stage of the instruction to be executed after the next 
instruction. 


Figure 4-4 illustrates the branch delay and the location of the branch delay slot. 


Branch IC RF EX DC WB 


(Branch Delay Slot)} IC RF EX DC WB { Single branch 
delay 
IC RF EX DC WB instruction 


Target 
<_—__—_ > 
Branch Delay 


Figure 4-4 Branch Delay 
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4.3 Load Delay 


A load instruction that does not allow its result to be used by the instruction 
immediately following is called a delayed load instruction. The instruction slot 
immediately following this delayed load instruction is referred to as the load delay 
Slot. 


In the Vp4300 processor, the instruction immediately following a load instruction 
can use the contents of the loaded register, however in such cases hardware 
interlocks insert additional delay cycles. Consequently, scheduling load delay 
slots can be desirable, both for performance and Vp-Series processor 
compatibility. 


4.4 Pipeline Operation 


The operation of the pipeline is illustrated by the following examples that describe 
how typical instructions are executed. The instructions described are: ADD, 
JALR, BEQ, TLT, LW, and SW. Each instruction is taken through the pipeline 
and the operations that occur in each relevant stage are described. 


Floating-point instructions are executed in the pipeline in the same manner as 
multicycle integer instructions. 
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Add Instruction 


ADD rd,rs,rt 


IC stage 


RF stage 


EX stage 


DC stage 


WB stage 


In phase 2 of the IC stage, the fourteen low-order bits of the 
virtual address are used to address the instruction cache. The 
two high-order bits of this virtual address select one of four 
instruction cache banks, and the remaining bits address the 
selected bank. The ITLB selects the page. 


In phase 1 of the RF stage, the cache index is compared with the 
page frame number from the ITLB and the cache data is read out. 
The cache hit/miss signal is valid late in phase 1 of the RF stage, 
and the virtual PC is incremented by 4 so that the next 
instruction can be fetched. 


During phase 2, the rs and rt fields of the 2-port register file are 
accessed and the register data is valid at the register file output. 
At the same time, bypass multiplexers select inputs from either 
the EX- or DC-stage output in addition to the register file output, 
depending on the need for an operand bypass. 


The ALU controls are set to do an A+B operation. The operands 
flow into the ALU inputs, and the ALU operation is started. The 
result of the ALU operation is latched into the ALU output latch 
during phase 2. 


This stage is a NOP for this instruction. The data from the 
output of the EX stage (the ALU) is moved into the output latch 
of the DC. 


During phase 1, the WB latch feeds the data to the inputs of the 
register file, which is addressed by the rd field. The file write 
strobe is enabled. By the end of phase 1, the data is written into 
the register file. 
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phase | ®1| 2 | 1] o2| o1| o2| o1| o2| 1] 22 | 


Cycle 


IC RF EX 


ICF ITC] RFR ALU 


IDEC 


Figure 4-5 Add Instruction Pipeline Operations 
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Jump and Link Register Instruction 


JALR rd,rs 


IC stage Same as the IC stage for the ADD instruction. 


RF stage During phase 2 of the RF stage, the register addressed by the rs 
field is read out of the file. 


EX stage During phase | of the EX stage, the value of register rs is 
clocked into the virtual PC latch. This value is used in phase 2 to 
fetch the next instruction. 


The value of the virtual PC incremented during the RF stage is 
incremented again to produce the link address PC+8 where PC is 
the address of the JALR instruction. The resulting value is the PC 
to which the program will eventually return from the jump 
destination. This value is placed in the Link output latch of the 
Instruction Address unit. 


DC stage The PC+8 value is moved from the Link output latch to the 
output latch of the DC pipeline stage. 


WB stage Refer to the ADD instruction. Note that if no value is explicitly 
provided for rd then register 31 is used as the default. If rd is 
explicitly specified, it cannot be the same register addressed by 
rs; if it is, the result of executing such an instruction is 
undefined. 


poot/  \_f Vf \ S/S \ SVS 


phase | O1| 2 | o1| o2| o1| o2| o1| o2| o1| @2 | 


Cycle IC RF EX DC WB 


ICF ITC} RFR ALU RFW 


ITLB IDEC 


IVA 


Figure 4-6 Jump and Link Register Instruction Pipeline Operations 
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Branch on Equal Instruction 


BEQ rs,rt,offset 


IC stage 


RF stage 


EX stage 


DC stage 


WB stage 


Same as the IC stage for the ADD instruction. 


During phase 2, the register file is addressed with the rs and rt 
fields and the contents of these registers are placed in the register 
file output latch. 


During phase 1, a check is performed to determine if each 
corresponding bit position of these two operands has equal 
values. If they are equal, the PC is set to PC+target, where 
target is the sign-extended offset field. If they are not equal, the 
PC is set to PC+4. 


The next PC resulting from the branch comparison is valid at the 
beginning of phase 2 for instruction fetch. 


This stage is a NOP for this instruction. 


This stage is a NOP for this instruction. 


root Lf \L_f \_/ \ SS \LS 


phase | 01] o2| 1] o2 | o1| o2| o1| 2| o1| 2 | 


Cycle IC 


RF EX DC WB 


ICF 


ITC} RFR 


ITLB 


IDEC | BCMP 


IVA 


Figure 4-7 Branch on Equal Instruction Pipeline Operations 
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Trap if Less Than Instruction 


TLT rs,rt 


IC stage 
RF stage 


EX stage 


DC stage 


WB stage 


Same as the IC stage for the ADD instruction. 
Same as the RF stage for the ADD instruction. 


During the phase 1, the bypass multiplexers select inputs from 
the RF-, EX- or DC-stage output latch, depending on the need 
for an operand bypass. ALU controls are set to do an A-— B 
operation. The operands flow into the ALU inputs, and the ALU 
operation is started. 


The result of the ALU operation is latched into the ALU output 
latch during phase 2. 


The sign bits of operands and of the ALU output latch are 
checked to determine if a /ess than condition is true. If this 
condition is true, a Trap Exception occurs. This, as with all 
pipeline exceptions, implies a 2-cycle stall. The PC register is 
loaded with the value of the exception vector and instructions 
following in previous pipeline stages are killed. 


The exception code is set in the ExCode field in the cause 
register if the less than condition was met in the DC stage. The 
PC value of this instruction is stored in the EPC register and BD 
bit are updated appropriately according to the contents of the 
EXL bit of the Status register. If the less than condition was not 
met in the DC stage, no activity occurs in the WB stage. 


pook/ Vf \_f \_/ \_S/ \S/ 


phase | ®1| ©2| o1| o2| o1| 2] 01] o2| o1| @2 | 


Cycle IC 


RF EX DC WB 


ICF 


ITC} RFR ALU RFW 


ITLB 


IDEC 


IVA 


Figure 4-8 Trap if Less Than Instruction Pipeline Operations 
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Load Word Instruction 


LW rt,offset(base) 


IC stage 


RF stage 


EX stage 


DC stage 


WB stage 


Same as the IC stage for the ADD instruction. 


Same as the RF stage for the ADD instruction. Note that the base 
field is in the same position as the rs field. 


Refer to the EX stage for the ADD instruction. For LW, the 
inputs to the ALU come from GPR/[base] through the bypass 
multiplexer and from the sign-extended offset field. The result of 
the ALU operation that is latched into the ALU output latch in 
phase 2 represents the effective virtual address of the operand 
(DVA). 


The data cache is accessed in parallel with the TLB, and the 
cache tag field is compared with the Page Frame Number (PEN) 
field of the TLB entry. After passing through the load aligner, 
aligned data is placed in the DC output latch during phase 2. 


During phase 1, the cache read data is written into the file 
addressed by the rt field. 


rooc/ Vf \ ff \_f \_F \S/ 


phase | ®1| ©2| o1| 2 | o1| o2| o1| 2| 01] @2 | 


Cycle 


IC 


RF EX DC WB 
ICF ITC} RFR DVA DCR LA | RFW 
ITLB IDEC DTLB DTC 


Figure 4-9 Load Word Instruction Pipeline Operations 
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Store Word Instruction 


SW rt,offset(base) 


IC stage 
RF stage 


EX stage 


DC stage 


WB stage 


Same as the IC stage for the ADD instruction. 
Same as the RF stage for the LW instruction. 


Refer to the LW instruction for a calculation of the effective 
address. From the RF output latch the GPR/rt] is sent through 
the bypass multiplexer and into the main shifter, where the 
shifter performs the byte-alignment operation for the operand. 
The results of the ALU and the shift operations are latched in the 
output latches during phase 2. 


Refer to the LW instruction for a description of the cache access. 
Additionally, the merged data from the load aligner is moved into 
the store data output latch during phase 2. 


If there was a cache hit, the content of the store data output latch 
is written into the data cache at the appropriate word location. 


Note that all store instructions use the data cache for two 
consecutive PCycles. If the following instruction requires use of 
the data cache, the pipeline is stalled for one PCycle to complete 
the writing of an aligned store data. 


pook/ Vf \_f \_/ \_S \L/ 


Phase | ®1] 2 | o1| o2 | o1| ©2| o1| 02| 1] @2 | 
Cycle IC RF EX DC WB 

ICF ITC| RFR DVA DCR LA 

ITLB IDEC DTLB DTC DCW 


Figure 4-10 Store Word Instruction Pipeline Operations 
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4.5 Interlock and Exception Handling 


Smooth pipeline flow is interrupted when cache misses or exceptions occur, or 
when data dependencies are detected. Interruptions handled using hardware, such 
as cache misses, are referred to as interlocks, while those that are handled using 
software are called exceptions. 


As shown in Figure 4-11, all interlock and exception conditions are collectively 
referred to as faults. 


Faults 


aa Hardware 


Figure 4-11 Interlocks, Exceptions, and Faults 


At each cycle, exception and interlock conditions are checked for all active 
instructions. 


Because each exception or interlock condition corresponds to a particular pipeline 
stage, a condition can be traced back to the particular instruction in the exception/ 
interlock stage, as shown in Figure 4-12. For instance, an LDI Interlock is raised 
in the execution (EX) stage. 


Tables 4-2 and 4-3 describe the pipeline interlocks and exceptions listed in Figure 
4-12. 
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Clock \ oe ys i 
PCycle | 61 | &2| ©1| o2| 01] ©2| o1| ©2| 01] o2| 
Pipeline St 
Sens ipeline Stage 
IC RF EX DC WB 
ITM LDI DCM CPOI 


Interlock 


Exceptions 


Remark The conditions of the exceptions are shown starting from the 
exception with the highest priority. 


Figure 4-12. Correspondence of Pipeline Stage to Interlock and Exception Condition 
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Table 4-2. Description of Pipeline Exceptions 


Exception Description 
IADE Instruction Address Error Exception 
ITLB Instruction TLB Exception 

IBE Instruction Bus Error Exception 
SYSC SYSCALL Instruction Exception 
BRPT Breakpoint Instruction Exception 

CPU Coprocessor Unusable Exception 
RSVD Reserved Instruction Exception 

RST External Reset Exception 

NMI External NMI Exception 
OVFL Integer Overflow Exception 
TRAP TRAP Instruction Exception 

FPE Floating-point Exception 
DADE Data Address Error Exception 
DTLB Data TLB Exception 
WAT Reference to Watch Address Exception 
INTR Interrupt Exception 

DBE Data Bus Error Exception 


Table 4-3 Description of Pipeline Interlocks 


Interlock Description 
IT™ Instruction TLB Miss 
ICB Instruction Cache Busy 
LDI Load Interlock 
MCI Multi-cycle Interlock 
DCM Data Cache Miss 
DCB Data Cache Busy 
COp Cache Op 
CPOI CPO Bypass Interlock 
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4.6 Pipeline Interlocks and Exceptions 


When an interlock or exception condition arises, pipeline flow is interrupted. 
Depending upon whether the condition is an interlock or an exception, one of the 
following occurs: 


¢ — If an interlock condition arises, the pipeline remains stalled until the 
interlock is corrected by hardware. 


e If an exception occurs, the exception-causing instruction and all 
pipelines that follow are aborted, the exception is resolved by 
software, and the pipeline restarted and reloaded. 


Pipeline interlocks and pipeline exceptions are described in the following section. 
The exceptions themselves are described in Chapter 6 Exception Processing. 


Bypassing, which allows data and conditions produced in the EX, DC and WB 
stages of the pipeline to be made available to the EX stage of the next cycle, is also 
described in this section. 


4.6.1 Pipeline Interlocks 


When an interlock condition occurs, the pipeline stalls and remains stalled until 
the interlock is corrected. Should pipeline stall requests from different stages arise 
simultaneously, the Pipeline Control Unit prioritizes the stall requests. For 
instance, a stall request from the DC stage is always allowed to be resolved before 
a simultaneous RF-stage stall request, since both may require the same resource 
(TLB, memory) to be resolved. The EX stage is allowed to stall in order to 
complete a multicycle instruction as long as there is no load dependency between 
itself (the EX stage) and the DC stage. Interlock conditions for each pipeline stage 
are shown in Figure 4-12 and described in Table 4-3. 
The remainder of this section describes in detail the following pipeline interlocks: 

e Instruction TLB Miss (ITM) 

e Instruction Cache Busy (ICB) 

¢ Load Interlock (LDI) 

¢ Multicycle Instruction Interlock (MCI) 

« Data Cache Miss (DCM) 

e Data Cache Busy (DCB) 

¢ Cache Operation (COp) 


¢ CPO Bypass Interlock (CPOD 
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4.6.2 Instruction TLB Miss (ITM) 


A pipeline stall due to an Instruction TLB Miss occurs when the virtual address of 
the next instruction to be fetched is not found in the instruction micro-TLB 
(ITLB). 


The pipeline stalls when the micro-TLB miss is detected in the RF stage, 
whereupon the pipeline controller notifies the micro-TLB to proceed in servicing 
the stall. The pipeline starts running again when the micro-TLB has been updated 
from the JTLB. 


A miss penalty of 3 PCycles is incurred when the micro-TLB is updated from the 
JTLB. 


If the virtual address also misses in the JTLB, an exception is taken which 
overrides the stall to allow the handler to update the JTLB. Once the update is 
completed, the instruction fetch is re-executed. This initiates a repeat of the ITM 
stall until the micro-TLB is updated from the JTLB, which was just updated by the 
exception handler. 


| Run | Stall | Stall | Stall | Run | Run | Run | Run | Run | Run | Run 
ITM TTM 


y y 


IC | RF | RF | RF | RF | EX | DC | WB 


ITLB ITLB 
Miss Access JTLB Update 


IC | IC | IC | IC | RF | EX | DC | WB 


IC | RF | EX | DC | WB 


2m aoe 


Figure 4-13 Instruction TLB Miss Interlock 
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4.6.3 Instruction Cache Busy (ICB) 
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A pipeline stall due to an Instruction Cache Busy interlock occurs when the next 
instruction is not found in the instruction cache, and the cache cannot service the 
Instruction Fetch. The pipeline stalls when the instruction cache miss is detected 
in the RF stage. After detecting the stall, the pipeline controller notifies the 
instruction cache to proceed in servicing the stall. 


The pipeline begins running again after the entire cache line has been written into 
the instruction cache. 


When the instruction cache is busy with a CACHE instruction and the Instruction 
Fetch cannot be serviced, a Cache Operation (COp) interlock is taken, not ICB. 


Run | Stall SP Sy gt | Stall Run | Run Run | Run [Ron | Run | Run 


ICB iCB 


IC | RF Be « » RF | RF'| EX | DC | WB 


-cache _ Refill |-cache l-cache 
Miss Update 


c | IC | IC | RF | EX | DC | WB 


IC | RF | EX | DC} WB 


IC | RF | EX p¢ | we 


Figure 4-14 Example of an Instruction Cache Busy Interlock 
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4.6.4 Multicycle Instruction Interlock (MCI) 


A pipeline stall due to a Multicycle Interlock occurs when an instruction with an 
execution latency of more than one pipeline clock enters the EX stage. 


The pipeline begins running again during the multicycle instruction’s last clock of 
operation in the EX stage. 


| Run | Run | Run | Run | Stall | eee Stall eee | Run | Run 


MCI qa 
Y | 


MultA,B| IC | RF | EX | EX e « « | EX Ex" DC | WB 


Read MultHi) IC Rr | RE e « «| RF| RF | EX | DC 


Read MultLo | IC e « «| IC | IC | RF | EX 


Multiple 
Cycle Instruction 
Stall 


Figure 4-15. Example of a Multicycle Instruction Interlock 
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4.6.5 Load Interlock (LDI) 
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A pipeline stall due to a Load Interlock occurs when data fetched by a load 
instruction is required by the next immediate instruction. The pipeline stalls when 
the load-use instruction (the instruction using the load data), enters the EX stage. 


The pipeline begins running again when the clock after the target of the load is 
read from the data cache (in the DC stage of the “Load B” instruction in Figure 4- 
16). 


The Load Interlock is normally only active for one PClock cycle when the load 
instruction is in the DC stage and the load-use instruction is in the EX stage. The 
data returned from the data cache at the end of the DC stage is input into the EX 
stage, using the bypass multiplexers. 


If the data cache misses, the Data Cache Busy interlock extends the stall until the 
data cache has been updated with the missing data. The LDI is still active during 
this time and extends the stall one clock beyond the Data Cache Interlock while 
the data is bypassed from the data cache into the EX stage. 


This case is illustrated in Figure 4-17. 


| Run Run | Run | Run |Stall | Run | Run | Run | Run | Run 


Load A IC | RF | EX | DC | WB | WB 


Load B 


detected 


yNXyV 
Add A,B | IC | RF | EX | EX | DC we 


IC | RF | RF | EX | DC | WB 


IC | IC | RF | EX | DC | WB 


Figure 4-16 Example of a Load Interlock 
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4.6.6 Data Cache Miss (DCM) 


If a data cache miss occurs in the DC stage, the pipeline stalls for 1 PCycle in 
which the miss is detected. The pipeline stalls regardless of whether the load or 
store instruction is executed. The data cache busy (explained next) continues 
stalling until a new cache line is read. 


When a requested word data has been read from the cache, the pipeline begins 
running again. 


Figure 4-17 illustrates DCM. 


4.6.7 Data Cache Busy (DCB) 


A pipeline stall due to the data cache being busy can occur in the following two 
situations: 


If the instruction immediately after a store instruction requires use of 
the data cache then the pipeline is stalled in its DC stage while the 
store writes the data to the cache during its WB stage. On a cache 
store hit the pipeline only stalls for one PClock while the data is 
written to the data cache. On a cache store miss the pipeline stalls 
with the store in the DC stage until the cache line has been updated. 
Once the line has been updated, the pipeline restarts and moves the 
store instruction into the WB stage. If the instruction following the 
“store” (i.e. the instruction currently in the DC stage) also requires 
access to the data cache, the pipeline will then stall for one PCycle 
while the store data is being written to the cache. 


When a miss occurs on a load, the data cache signals it is busy while 
it fetches the missed data word from external memory. Refer to 
Figure 4-17. 


The pipeline begins running again on a load when the missed data word is 
available from the data cache. 
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Run | Run | Run | Stall | Run | Run | Stall 


eee Stall eoecccce | Run | Run 


DC | WB eee WB} WB} WB 


= 
|_ 


N 
Add A,B | IC | RF | EX | EX 


Load C 


detected LDI 
| 


IC | IC ‘RF im eee EX Re EX o° | 
eee RF |} RF | RF ex] 


Figure 4-17 Example of a Data Cache Miss Followed by a Load Interlock 


4.6.8 CACHE Operation (COp) 


A pipeline stall due to a CACHE operation can occur in the following two 
situations: 


¢ When an instruction cache operation instruction enters the DC stage, 
the instruction cache operation continues to be serviced while the 
pipeline stalls. The pipeline begins running again when the 
instruction cache operation is complete, allowing the next instruction 
fetch to proceed. 


e When the data cache operation instruction requiring an operation of 2 
PCycles of the data cache has entered the DC stage. 
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4.6.9 Coprocessor 0 Bypass Interlock (CPOD 


A pipeline stall due to a CPO Bypass Interlock occurs when an instruction which 
caused an exception reaches the WB stage and the subsequent instruction in the 
DC stage requests a read of any CPO register. 


This interlock causes a pipeline stall for one PCycle to allow the CPO register to 
be written in the WB stage before allowing any CPO register to be read in the DC 


stage. 
| Run | Run Run | Run | Stall | Run | Run | Run | Run 
Instruction WB stage completes in 
which causes IC RF |} EX | DC | WB first phase of stage 
exception 


| 
y.v 
Load LO Ic | RF | EX | DC'| DC’ | wB 


IC | RF | EX | EX | DC | WB 


IC | RF | RF ex | Do WB 


Figure 4-18 Example of a Coprocessor 0 Bypass Interlock (CPOI) 
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4.7 Pipeline Exceptions 


When a pipeline exception condition occurs, the pipeline stalls for 2 PCycles and 
the instruction causing the exception as well as all those that follow it in the 
pipeline are aborted. Accordingly, any stall conditions and any later exception 
conditions from any aborted instruction are inhibited; there is no benefit in 
servicing stalls for an aborted instruction. 


After aborting the instructions, an execution starts at a predefined exception 
vector. System Control Coprocessor (CPO) registers are loaded with information 
that identifies the type of exception as well as auxiliary information such as the 
virtual address at which translation exceptions occur. 


Exception conditions for each pipeline stage are shown in Figure 4-12 and 
described in Table 4-2. 


Exceptions can split into two groups: 


e those that occur independently of instruction execution (Reset, NMI, 
and interrupt exceptions) 


e those exceptions that result from the execution of a particular 
instruction (an instruction-dependent exception). This category 
includes all other exceptions. 


Exceptions are logically precise. 


4.7.1 Instruction-Independent Exceptions (Reset, NMI, and Interrupt) 
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Reset, NMI and interrupt exceptions are identified and processed as follows: 


e Reset exception has the highest priority of all the possible exceptions; 
when a Reset exception is asserted, instructions in all pipeline stages 
except the WB are aborted regardless of any interlocks or other 
exceptions that may be active. 


e NMI and interrupt exception requests are accepted only if the 
previous PCycle was a run cycle. When an NMI or interrupt 
exception occurs, all pipeline stages except the WB are aborted. 
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4.7.2 Instruction-Dependent Exceptions 


Prioritizing between instruction-dependent exceptions and interlocks is made 
according to these rules: 


an exception request from a particular pipeline stage is only 
processed if no stall condition from a later pipeline stage is active. 


an exception request from a later pipeline stage always has a higher 
priority than an exception from an earlier pipeline stage. 


an exception request from a pipeline stage always has higher priority 
than any stall request from the same or earlier pipeline stages. 


4.7.3 Interactions between Interlocks and Exceptions 


With the Vp4300, the processing of the EX and RF stages can be continued while 
the pipeline stalls. The interaction between interlocking of the two stages and 
exceptions is relatively simple. 


Interaction between EX and RF Stages 


The EX exception occurs only when an instruction that causes the EX exception 
has entered a pipeline stage. Because the RF interlock solving processing has not 
yet been started at this time, the EX exception takes precedence because of the 

stall request from the RF stage. Interactions in various cases are described next. 


When EX exception is stalled by DC interlock 
The EX exception takes precedence over the RF stall request. This is 
because the RF interlock is not solved during the DC stall period. 


If instruction cache busy and multi-cycle instruction interlock 
take place simultaneously 

Both the RF and EX stages solve the respective interlocks. 

The cause that has generated a floating-point exception is detected 
before the instruction cache busy (ICB) stall ends, but the exception 
occurs after execution has entered the DC stage. Therefore, the 
exception condition is retained in the EX stage until the RF interlock 
is solved, and the related stage is deleted. 


If exception from EX stage and RF interlock take place 
simultaneously 

The EX exception takes precedence. This is because the instruction 
that has caused the RF interlock is canceled and no request is issued 
to the external memory. 
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Interaction between RF and DC Stages 


If a stall request is made at the same time in the RF and DC stages, the pipeline 
controller gives the priority to the processing of the DC stage. In other words, the 
RF stall processing is started after the DC stall has been solved. This is because 
the same resources (such as the system interface and TLB) are necessary for 
solving the RF interlock and DC interlock. 


4.7.4 Exception and Interlock Priorities 


The priority for processing exceptions and interlocks within the same clock cycle 
is listed below. Exception and interlock requests from the WB stage always have 
priority over exception and interlock requests from the DC stage. Exception and 
interlock requests from the DC stage always have priority over exception and 
interlock requests from the EX stage. EX-stage exception and interlock requests 
in turn always have priority over any exception and interlock requests from the RF 
stage. 


IC RF EX DC WB 
Priority: 


IC RF EX DC | Higher 


Figure 4-19 Execution and Interlock Priorities 


In the case of multiple exception requests from the same pipeline stage, the 
highest-priority exception is processed first. The priority of the instruction- 
dependent exceptions and interlocks are shown in the following sections. 
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Pipeline 


Because there is only the following one exception or interlock in the WB stage, 
there is no priority. 


4.7.6 DC-Stage Interlock and Exception Priorities 


CPO Bypass interlock 


Following is a prioritized list of the exceptions and interlocks processed in the DC 
pipeline stage. 


Reset exception (highest) 
NMI exception 

Integer Overflow exception 
Trap exception 
Floating-Point exception 
Data Address Error exception 
Data TLB Miss exception 
Data TLB Invalid exception 
Data TLB Modification exception 
Watch exception 

Interrupt exception 

Data Cache Miss interlock 
Data Cache Busy interlock 
CACHE Op interlock 


Data Bus Error exception 
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4.7.7 EX-Stage Interlock and Exception Priorities 


Following is a prioritized list of the exceptions and interlocks processed in the EX 


stage. 


System Call exception 
Breakpoint exception 
Coprocessor Unusable exception 
Reserved Instruction exception 
Load interlock 


Multicycle Instruction interlock 


4.7.8 RF-Stage Interlock and Exception Priorities 


Following is a prioritized list of the exceptions and interlocks processed in the RF 
pipeline stage. 


Instruction Address Error exception 
Instruction TLB Miss exception 
Instruction TLB Invalid exception 
Instruction TLB Miss interlock 
Instruction Cache Busy interlock 


Instruction Bus Error exception 


If an Instruction Bus Error exception occurs during a cache refill, while an 
Instruction Cache Busy interlock is active, the instruction cache only signals the 
exception to the pipeline controller after the cache refill is complete, and therefore 
no Stall is active. 


Individual exceptions are described in detail in Chapter 6 Exception Processing. 
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4.7.9 Bypassing 


In some cases, data and conditions produced in the EX, DC and WB stages of the 
pipeline are made available to the EX stage (only) through the bypass datapath. 


Operand bypass allows an instruction in the EX stage to continue without having 
to wait for data or conditions to be written to the register file at the end of the WB 
stage. Instead, the Bypass Control Unit ensures data and conditions from later 
pipeline stages are available at the appropriate time for instructions earlier in the 
pipeline. 


The Bypass Control Unit also controls the source and destination register 
addresses supplied from the register file. 


4.8 Code Compatibility 


The Vp4300 can execute any programs which can be executed on the Vp3000 
series and Vp4000 series*, but the reverse may not necessarily be true. Standard 
MIPS compilers produce code which will run on both. When hand-coding 
assembly code, it is strongly advised to maintain compatibility with the Vp Series. 
For more information, refer to the each product’s user’s manuals. 


* The instruction set on the Vp4100 differs partially from the other products. 
(For example, FPU instructions are not supported.) 
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4.9 Write Buffer 
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The Vp4300 processor contains an on-chip write buffer, used as a temporary data 
storage for outgoing data. The write buffer stores one doubleword (8 bytes) of 
data for each PCycle, and can buffer a total of eight words (32 bytes) of data, equal 
to the data cache line size. When storing data, therefore, all the data lengths can 
be used. 


The write buffer can store any data as long as it has a vacancy. 


The format of the write buffer is shown below. 


4 32 64 
Size Physical Address Data 
Size Physical Address Data 
Size Physical Address Data 
Size Physical Address Data 


Figure 4-20 Write Buffer Format 


The write buffer can store the following: 
¢ Four 32-bit physical addresses 
e  4-bit size area indicating four types of transfer data size 


e Data up to 4 doublewords 


During an uncached store operation, data is held in this buffer until it can be 
retrieved by the external interface. The processor pipeline continues to execute 
while data is stored in the write buffer. 


During either a load miss or a store miss to a cache line in the dirty state (refer to 
Chapter 11 Cache Memory for a description of cache line states), dirty data is 
stored in this buffer until the requested data is returned from the external interface. 
The processor pipeline continues to run while the write buffer waits (for a 
response from the external interface) to empty its contents to the external 
interface/memory. 


If the processor executes a load or store instruction requiring external resources 
when the write buffer is full, the pipeline is stalled until the write buffer has a 
space for the data to be stored. 
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The Vp4300 processor provides a full-featured memory management unit (MMU) 
which uses an on-chip translation lookaside buffer (TLB) to translate virtual 
addresses into physical addresses. 


This chapter describes the operation of the TLB, those System Control 
Coprocessor (CPO) registers that provide the software interface to the TLB and the 
memory mapping method that translates the virtual address to the physical 
address. 
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5.1 Translation Lookaside Buffer (TLB) 


A virtual address is converted into a physical address by using the internal TLB’. 
The internal TLB is a full-associative memory having 32 entries, and one entry is 
mapped with an odd and even numbers in pairs. The size of these pages can be 
4K, 16K, 64K, 256K, 1M, 4M, or 16M, and can be specified for each entry. When 
a virtual address is given, each TLB entry checks the 32 entries whether the virtual 
address coincides with the virtual address appended with the ASID area stored to 
the Entry Hi register. 


If the addresses coincide (if a hit occurs), a physical address is generated from the 
physical address in the TLB and an offset. 


If the addresses do not coincide (if a miss occurs), an exception occurs, and the 
TLB entry is written by software from a page table on the memory. The software 
either writes the TLB entry over the entry selected by the index register, or writes 
it to arandom entry indicated by the random register. 


If there are two or more TLB entries that coincide, the TLB operation is not 
correctly executed. In this case, the TLB-Shutdown (TS) bit of the status register 
is set to 1, and then the TLB cannot be used. 


5.2 Memory Management System Architecture 


The memory management system expands the address space of the CPU by 
converting a large virtual memory space into physical addresses. 


The physical address space of the Vp4300 is 4 GB with 32-bit addresses used. A 
virtual address is 32 bits wide in the case of the 32-bit mode, and the maximum 
user area is 2 GB (2h). In the case of the 64-bit mode, the address is 64 bits wide, 
and the maximum user area is | TB (240), For the TLB entry format in each mode, 
refer to 5.3.1. 


The virtual address is expanded by the address space ID (ASID) (refer to Figures 
5-2 and 5-3). ASID decreases the number of times of TLB flash when the context 
is switched. The ASID area is 8 bits wide and is in the entry Hi register of CPO. 

The global bit (G) is in the entry LoO and entry Lol registers. 


* There are virtual-to-physical address translations that occur outside of the TLB. For example, 
addresses in the ksegO and kseg/ spaces are unmapped translations. In these spaces the physical 
address is derived by subtracting the base address of the space from the virtual address. 
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Virtual address 
1. Virtual address (VA) represented by the vir- 


tual page number (VPN, high-order bit of the ASID VPN 
address) is compared with indicated area in 
TLB. 


2. If there is a match, the page frame number 
(PEN) representing the high-order bits of 
the physical address (PA) is output from 
the TLB. 


TLB 
> Eniry 


TLB 


3. The Offset, which does not pass through the 


¥ 
- at aad. 


Physical address 


Figure 5-1 Overview of a Virtual-to-Physical Address Translation 
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Virtual-to-Physical Address Translation 
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Converting a virtual address to a physical address begins by comparing the virtual 
address from the processor with the virtual addresses in the TLB; there is a match 
when the virtual page number (VPN) of the address is the same as the VPN field 
of the entry, and either: 


e the Global (G) bit of the TLB entry is set, or 


¢ the ASID field of the virtual address is the same as the ASID field of 
the TLB entry. 


This match is referred to as a TLB hit. If there is no match, a TLB Miss exception 
is taken by the processor and software is allowed to reference a page table of 
virtual/physical addresses in memory and to write its contents to the TLB. 


If there is a virtual address match in the TLB, the physical address is output from 
the TLB and concatenated with the Offset, which represents an address within the 
page frame space. The Offset does not pass through the TLB. The lower bits of 
the virtual address are output as is. 


For details, refer to 5.4.9 Virtual-to-Physical Address Translation Process. 


The next two sections describe the 32-bit and 64-bit address translations. 
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32-bit Mode Address Translation 


Figure 5-2 shows the virtual-to-physical-address translation of a 32-bit mode 
address. This figure illustrates the two of seven possible page sizes: a 
4 KB page (12 bits) and a 16 MB page (24 bits). 
¢ The top portion of Figure 5-2 shows a virtual address with a 12-bit, 
or 4 KB, page size, labelled Offset. The remaining 20 bits of the 
address excluding ASID represent the VPN, and index the 1M-entry 
page table. 


¢ The bottom portion of Figure 5-2 shows a virtual address with a 24- 
bit, or 16 MB, page size, labelled Offset. The remaining 8 bits of the 
address excluding ASID represent the VPN, and index the 256-entry 
page table. 


Virtual Address with 1M (22°) 4 KB pages 


39 3231 29 28 12 11 0 
ASID VPN Offset 
8 N 4 J 12 
Y 20 
20 bits = 1M pages 
ye a ONE a 
Virtual-to-physical Offset passed 
translation in TLB unchanged to 
Bits 31, 30 and 29 of the virtual TLB physical inemony 
address select User, v 32-bit Physical Address 
Supervisor, or Kernel address 
spaces. 31 0 
PFN Offset 
Virtual-to-physical 
translation in via Reads 
l TLB physical memory 
A 
NO one ~ 
39 32 31 2928 24 23 0 
ASID VPN Offset 


8 bits = 256 pages 
Virtual Address with 256 (2°)16 MB pages 


Figure 5-2 32-Bit Mode Virtual Address Translation 
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64-bit Mode Address Translation 


Figure 5-3 shows the virtual-to-physical-address translation of a 64-bit mode 
address. This figure illustrates the two of seven possible page sizes: a 
4 KB page (12 bits) and a 16 MB page (24 bits). 
¢ The top portion of Figure 5-3 shows a virtual address with a 
12-bit, or 4 KB, page size, labelled Offset. The remaining 28 bits of 
the address excluding ASID represent the VPN, and index the 256M- 
entry page table. 


¢ The bottom portion of Figure 5-3 shows a virtual address with a 24- 
bit, or 16 MB, page size, labelled Offset. The remaining 16 bits of 
the address excluding ASID represent the VPN, and index the 64K- 
entry page table. 


Virtual Address with 256M (228) 4 KB pages 
71 64 636261 40 39 28 bits = 256M pages 12 11 0 
VPN Offset 


8 2 22 28 12 
DN. ) 
Virtual-to-physical na Offset passed 
irtual-to-physica 
translation in TLB |prscal to 
; : physical memory 
Bits 62 and 63 of the virtual 32-bit Physical Address 
address select User, Supervisor, 34 0 


or Kernel address spaces. 


PFN Offset 


Virtual-to-physical Offset passed 
translation in TLB unchanged to 
physical memory 


TLB 
A ~~. A ~ 
71 64 6362 61 4039 24 23 0 
ASID Oor-1 VPN Offset 
8 2 22 16 24 


16 bits = 64K pages 
Virtual Address with 64K (21°)16 MB pages 


Figure 5-3 64-Bit Mode Virtual Address Translation 
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5.2.1 Operating Modes 


The processor has three operating modes that function in both 32- and 64-bit 
operations: 


¢ User mode 
¢ Supervisor mode 


¢ Kernel mode 


The User mode and Kernel mode are common to all the Vp Series members. 
Generally, the operating system is executed in the Kernel mode, and the 
application program is executed in the user mode. The Vp4000 series is provided 
with a third mode. This mode, called the supervisor mode, is intermediate 
between the User and Kernel modes, and is used to organize a high security 
system. 


If an exception occurs, the CPU enters the Kernel mode, and remains in this mode 
until an exception return instruction (ERET) is executed. The ERET instruction 
restores the mode in which the processor was operating before the occurrence of 
the exception. 


5.2.2 Virtual Addressing in User Mode 


In the single-user mode, a virtual address space (useg) of 2 GB ease bytes) can be 
used in the 32-bit mode, and a 1 TB On bytes) virtual address space (xuseg) can 
be used in the 64-bit mode. As shown in Figures 5-2 and 5-3, each virtual address 
is expanded to a separate virtual address by an 8-bit address space ID (ASID) for 
up to 256 user processes. The system allocates each process with an ASID to 
retain the contents of the TLB even when it has switched the context. useg and 
xuseg are referenced via TLB. Whether the cache can be used or not is determined 
for each page by the TLB entry (the C bit of the TLB entry determines whether 
the cache can be used). 


The user segment starts from address 0 and the currently valid user process resides 
in useg (in the 32-bit mode) or xuseg (in the 64-bit mode). 


The Vp4300 operates in the user mode when the values of the bits in the Status 
register is as follows: 


¢  KSU bits = 10 
° EXL=0 
¢ ERL=0 


In conjunction with these bits, the UX bit in the Status register selects between 32- 
or 64-bit User mode addressing as follows: 
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* UX =0: Selects 32-bit useg 
TLB miss is processed by a 32-bit TLB miss exception handler. 


¢ UX =1: Selects 64-bit xuseg 
TLB miss is processed by a 64-bit XTLB miss exception handler. 


Table 5-1 lists the characteristics of the two user mode segments, useg and xuseg. 


Ox FEFF FEFF 


Ox 8000 0000 
Ox JEFF FEFF 


Ox 0000 0000 


32-bit* 64-bit 
Ox FFFF FFFF FEFE FFFE 
Address Address 
Error Error 
Ox 0000 0100 0000 0000 
Ox 0000 OOFF FFFF FFFF 
2 GB 1TB 
useg xuseg 
TLB Mapped TLB Mapped 


Ox 0000 0000 0000 0000 


* The Vp4300 internally uses 64-bit addresses. In the Kernel mode, the pro- 

cessor saves and restores each register to initialize the register before 
switching the context. A 32-bit value is used as an address, with bit 31 
sign-extended to bits 32 through 63, in the 32-bit mode. 
Usually, the program in the 32-bit mode does not generate invalid address- 
es. If the context is switched and the processor enters the Kernel mode, a 
value other than the 32-bit address previously sign-extended may be stored 
to a 64-bit register. In this case, the program in the user mode may gener- 
ate invalid addresses. 


Figure 5-4 User Mode Virtual Address Space 


Table 5-1 32-Bit and 64-Bit User Mode Segments 
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; Status Register 
Bedi eo Bie Bit Values Seement Virtual Address Range | Segment Size 
Values Name 
KSU|EXL|ERL} UX 
hi 0x0000 0000 
smu 10 |0 0 0 useg through 2 GE 
AGIYS0 Ox7FFF FFFF (2 bytes) 
hi 0x0000 0000 0000 0000 
- 40) =0 10 |0 0 1 xuseg through 0 ‘ B 
(eens 0x0000 00FF FFFF FFFF | (2° bytes) 


User’s Manual U10504EJ7VOUM00 


Memory Management System 


useg (32-bit mode) 


When the UX bit of the Status register is 0 and the most significant bit of the virtual 
address is 0, this virtual address space is referred to as useg. If an attempt is made 
to reference an address whose most significant bit is 1, an address error exception 
occurs (refer to Chapter 6 Exception Processing). 


xuseg (64-bit mode) 


If the UX bit of the Status register is 1 and the bits (63:40) of the virtual address 
are all 0, the virtual address space is referred to as xuseg. A user address space of 
1 TB OF bytes) can be used. If an attempt is made to reference an address that 
has | in bits (63:40), an address error exception occurs (refer to Chapter 6 
Exception Processing). 


5.2.3 Virtual Addressing in Supervisor Mode 


The supervisor mode shown in Figure 5-5 is intended for hierarchical execution 
of the operating system. In the Kernel mode, the Kernel operating system in the 
highest hierarchy is executed, and the other operating systems are executed in the 
supervisor mode. 


Referencing suseg, sseg, xsuseg, xsseg, and csseg (i.e., all spaces) is carried out 
via TLB. Whether the cache can be used or not is determined by the TLB entry of 
each page (the C bit of the TLB entry determines whether the cache can be used). 


The processor operates in the supervisor mode if the bits of the Status register are 
in the following status: 


°« KSU=01 
° EXL=0 
° ERL=0 


In addition, the addressing mode in the supervisor mode is determined by the SX 
bit of the Status register. 


e SX = 0: 32-bit supervisor space 
TLB miss is processed by a 32-bit TLB miss exception handler. 


e SX = 1: 64-bit supervisor space 
TLB miss is processed by a 64-bit XTLB miss exception handler. 


Table 5-2 shows the features of each segment in the supervisor mode. 
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Ox 
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FEFF 


E000 
DETE 


C000 
BEEF 


8000 
7EFF 


0000 


0000 
FFEF 


0000 


32-bit* 
Address Error 


0.5 GB 
TLB Mapped 


sseg 


Address Error 


2 GB 
TLB Mapped §S¥S°9 


Ox 


Ox 
Ox 


Ox 
Ox 


Ox 
Ox 


Ox 
Ox 


Ox 
Ox 


Ox 


FEEE 


FEFF 


FEEE 


FEEE 


EFEEE 


E000 


FEPE 


0000 


FEEE 


FEFF 


FEET 


FEFE 


DET E 


C000 


FFE 


0000 


FEEE 


4000 
4000 


4000 


EFEEE 


0100 
OOFF 


0000 


BFFE 


0000 
FEEF 


0000 


FREE 


0000 
FFEF 


0000 


3FFE 


0000 
0000 


0000 


EFEEE 


0100 
OOFF 


0000 


EFEEE 


0000 
FEEF 


0000 


FEFE 


0000 
FEEE 


0000 


64-bit 
Address Error 


0.5 GB 
TLB Mapped 


Address Error 


csseg 


1TB 
TLB Mapped 


xsseg 


Address Error 


1TB 
TLB Mapped 


xXSUSeEg 


* The Vp4300 internally uses 64-bit addresses. In the 32-bit mode, a 32-bit 
value with bits 32 through 63 sign-extended is used as an address. 
Normally, the program in the 32-bit mode does not generate an invalid ad- 
dress. However, there is a possibility that an integer overflow may occur 
as a result of an operation of base register + offset to calculate an address. 
The address calculated at this time is invalid, and the result is undefined. 
Two causes of the overflow are cited below. 


e When bit 15 of offset = 0, bit 31 of base register = 0, and bit 31 of 
(base register + offset) = 1 


e When bit 15 of offset = 1, bit 31 of base register = 1, and bit 31 of 
(base register + offset) = 0 


Figure 5-5 Supervisor Mode Address Space 
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Table 5-2. 32-Bit and 64-Bit Supervisor Mode Segments 


. Status Register 
pais Bik Bit Values Seament Virtual Address Range Segment 
Values Name Size 
KSU|/EXL|ERL] SX 
ge 0x0000 0000 ch 
A(31) =0 O1 |0 0 0 | suseg through (231 byte) 
a 0x7FFF FFFF y 
aoe 01 |0 0 0 | sseg aie oa Me 
. = 29 
A(31:29) = 110 pee Tae (2? bytes) 
. = 40 
ee 0x0000 OOFF FFFF FFFE | °% >ytes) 
. = 40 
ARE O2 =O 0x4000 OOFF FEFF FFFF | °° >ytes) 
: a 29 
Oo 02H OxFFFF FFFF DEFF FFFF | yes) 


32-bit Supervisor Mode, User Space (suseg) 


In Supervisor mode, when SX = 0 in the Status register and the most-significant 
bit of the virtual address is set to 0, the suseg virtual address space is selected; it 
covers the full 2! bytes (2 GB) of the current user address space. The virtual 
address is extended with the contents of the 8-bit ASID field to form a unique 
virtual address. 


32-bit Supervisor Mode, Supervisor Space (sseg) 


In Supervisor mode, when SX = 0 in the Status register and the three high-order 
bits of the virtual address are 110, the sseg virtual address space is selected; it 
covers 27? bytes (512 MB) of the current supervisor address space. The virtual 
address is extended with the contents of the 8-bit ASID field to form a unique 
virtual address. 
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64-bit Supervisor Mode, User Space (xsuseg) 


In Supervisor mode, when SX = | in the Status register and bits 63:62 of the virtual 
address are set to 00, the xsuseg virtual address space is selected; it covers the full 
on bytes (1 TB) of the current user address space. The virtual address is extended 
with the contents of the 8-bit ASID field to form a unique virtual address. 


64-bit Supervisor Mode, Current Supervisor Space (xsseg) 


In Supervisor mode, when SX = | in the Status register and bits 63:62 of the virtual 
address are set to 01, the xsseg current supervisor virtual address space is selected; 
it covers the full 27° bytes (1 TB) of the current supervisor address space. The 
virtual address is extended with the contents of the 8-bit ASID field to form a 
unique virtual address. 


64-bit Supervisor Mode, Separate Supervisor Space (csseg) 


In Supervisor mode, when SX = 1 in the Status register and bits 63:62 of the virtual 
address are set to 11, the csseg separate supervisor virtual address space is 
selected. The virtual address is extended with the contents of the 8-bit ASID field 
to form a unique virtual address. 
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5.2.4 Virtual Addressing in Kernel Mode 


The processor operates in Kernel mode when the Status register contains one or 
more of the following values: 


¢ KSU =00 
© EXL=1 
¢ ERL=1 


In conjunction with these bits, the KX bit in the Status register selects between 32- 
or 64-bit Kernel mode addressing space: 


¢ when KX = 0, 32-bit kernel space is selected 
TLB miss is processed by a 32-bit TLB miss exception handler. 


¢ when KX = 1, 64-bit kernel space is selected 
TLB miss is processed by a 64-bit XTLB miss exception handler. 


The processor enters Kernel mode whenever an exception is detected and it 
remains in Kernel mode until an Exception Return (ERET) instruction is executed 
and results in ERL and/or EXL = 0. The ERET instruction restores the processor 
to the mode existing prior to the exception. 


Kernel mode virtual address space is divided into regions differentiated by the 
high-order bits of the virtual address, as shown in Figure 5-6. Table 5-3 lists the 
characteristics of the 32-bit kernel mode segments, and Table 5-4 lists the 
characteristics of the 64-bit kernel mode segments. 
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32-bit* 
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TLB Mapped 
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Uncached 
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2 GB 
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3FFE 


0000 
0000 


0000 


FEFFE 


0100 
OOFF 


0000 


FFEE 


0000 
FFEF 


0000 


64-bit 


0.5 GB 
TLB Mapped 


0.5 GB 
TLB Mapped 


0.5 GB 
TLB Unmapped 
Uncached 


0.5 GB 
TLB Unmapped 
Cacheable 


Address Error 


TLB Mapped 


TLB Unmapped 
(For details, refer to 
Figure 5-7.) 


Address Error 


1TB 
TLB Mapped 


Address Error 


1TB 
TLB Mapped 


ckseg3 


cksseg 


ckseg1 


ckseg0 


xkseg 


xkphys 


xksseg 


xkuseg 


* The Vp4300 internally uses 64-bit addresses. In the 32-bit mode, a 32-bit 
value with bits 32 through 63 sign-extended is used as an address. 
Normally, the program in the 32-bit mode uses 64-bit instructions. How- 
ever, there is a possibility that an integer overflow may occur as a result of 
an operation of base register + offset to calculate an address. The address 
calculated at this time is invalid, and the result is undefined. Two causes of 
the overflow are cited below. 


e When bit 15 of offset = 0, bit 31 of base register = 0, and bit 31 of 
(base register + offset) = 1 


e When bit 15 of offset = 1, bit 31 of base register = 1, and bit 31 of 
(base register + offset) = 0 


Figure 5-6 Kernel Mode Address Space 
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Figure 5-7 Details of xkphys Field 
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Table 5-3 32-Bit Kernel Mode Segments 


Address Bit 
Values 


Status Register 
Bit Value 


KSUEXLIERL 


Virtual 
Address 


Segment 
Name 


Physical 
Address 


Segment 
Size 


KX 


A(G31) =0 


A(31:29) = 100 


A(31:29) = 101 


A(31:29) = 110 


A(B1:29) = 111 
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0x0000 0000 
through 
Ox7FFF FFFF 


2 GB 


tases (23! bytes) 


TLB map 


0x8000 0000 
through 
Ox9FFF FFFF 


0x0000 0000 
through 
Ox 1FFF FFFF 


512 MB 
(27° bytes) 


kseg0 


OxA000 0000 
through 
OxBFFF FFFF 


0x0000 0000 
through 
Ox 1 FFF FFFF 


512 MB 


ksee] (2? bytes) 


0xC000 0000 
through 
OxDFFF FFFF 


512 MB 


ESSES (2? bytes) 


TLB map 


OxE000 0000 
through 
OxFFFF FFFF 


512 MB 


pte? (2? bytes) 


TLB map 


32-bit Kernel Mode, User Space (kuseg) 


In Kernel mode, when KX = 0 in the Status register, and the most-significant bit 
of the virtual address is cleared, the kuseg virtual address space is selected; it 
covers the current 27! bytes (2 GB) user address space. The virtual address is 
extended with the contents of the 8-bit ASID field to form a unique virtual 
address. 


This space is referenced via TLB. Whether the cache can be used or not is 
determined by the value of the C bit of the TLB entry of each page. 


If the ERL bit of the Status register is 1, the user address area is a 2 GB area that 
cannot be cached without TLB mapping (i.e., the virtual addresses are used as 
physical addresses as is). However, this is a function used by the Vp4400 to 
process an ECC error in an exception handler. This function is defined to maintain 
the compatibility of the Vp4300 with the Vp4400 because the Vp4300 does not 
have an ECC and a parity function. 
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32-bit Kernel Mode, Kernel Space 0 (Kseg0) 


In Kernel mode, when KX = 0 in the Status register and the high-order three bits 
of the virtual address are 100, kseg0 virtual address space is selected; it covers the 
current 27°-byte (512 MB) address space. 


References to kseg0 are not mapped through the TLB; the physical address 
selected is defined by subtracting 0x8000 0000 from the virtual address. 


The KO field of the Config register controls cacheability. (Refer to Chapter 6 
Exception Processing.) 


32-bit Kernel Mode, Kernel Space 1 (kseg/) 


In Kernel mode, when KX = 0 in the Status register and the high-order three bits 
of the virtual address are 101, kseg/ virtual address space is selected; it covers the 
current 27°-byte (512 MB) address space. 


References to kseg/ are not mapped through the TLB; the physical address 
selected is defined by subtracting 0xA000 0000 from the virtual address. 


Caches are disabled for accesses to these addresses, and physical memory (or 
memory-mapped I/O device registers) are accessed directly. 


32-bit Kernel Mode, Supervisor Space (ksseg) 


In Kernel mode, when KX = 0 in the Status register and the high-order three bits 
of the virtual address are 110, the ksseg virtual address space is selected; it covers 
the current 2?*_byte (512 MB) virtual address space. The virtual address is 
extended with the contents of the 8-bit ASID field to form a unique virtual 
address. 


This space is referenced via TLB. Whether the cache can be used or not is 
determined by the value of the C bit of the TLB entry of each page. 
32-bit Kernel Mode, Kernel Space 3 (Kseg3) 


In Kernel mode, when KX = 0 in the Status register and the high-order three bits 
of the virtual address are 111, the kseg3 virtual address space is selected; it is the 
current 27°-byte (512 MB) virtual address space. The virtual address is extended 
with the contents of the 8-bit ASID field to form a unique virtual address. 


This space is referenced via TLB. Whether the cache can be used or not is 
determined by the value of the C bit of the TLB entry of each page. 
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Table 5-4 64-Bit Kernel Mode Segments 
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Add Status Register F ‘ pnveeeal P ‘ 
ress : egmen ; ysica egmen 
Bit Value 
Bit Values : ic Name Virtual Address Address Size 
KSUIEEXLIERL) KX 
0x0000 0000 0000 0000 iTB 
A(63:62) = 00 1 | xkuseg through TLB map 740 p 
0x0000 OOFF FFFF FFFF (2™ bytes) 
0x4000 0000 0000 0000 iTB 
A(63:62) = 01 1 | xksseg through TLB map 740 5 
0x4000 OOFF FFFF FFFF (2™ bytes) 
xkphys 
Refer to 
64-bit 
Kernel 
Mode, 0x8000 0000 0000 0000 | 0x0000 0000 
A(63:62) = 10 1 | Physical through through 92 bytes 
Spaces OXBFFF FFFF FRFF FFFR OxFFFF FFFF 
(xkphy) 
KSU = 00 on the 
or following 
EXL = 1 page. 
or 0xC000 0000 0000 0000 740 4 931 
A(63:62) = 11] ERLE =1 1 | xkseg through TLB map b 
0xC000 OOFF 7FFF FFFF yies 
: = OxFFFF FFFF 8000 0000 | 0x0000 0000 
rae 7 : 1 | cksegO through through SHS 
(61:31) =— OxFFFF FFFF 9FFF FFFF] 0x1 FFF FFFF |(2~ bytes) 
2 = OxFFFF FFFF A000 0000] 0x0000 0000 
ee 7 ; 1 | cksegl through through Rare 
(61:31) =— OxFFFF FFFF BFFF FFFF 0x 1FFF FFFF |(2~ bytes) 
: = OxFFFF FFFF C000 0000 
fone 7 ‘ 1 | cksseg through TLB map Sa 
(61:31) =— OxFFFF FFFF DFFF FFFF (2™ bytes) 
: = OxFFFF FFFF E000 0000 
ae a eF 1 | ckseg3 through TLB map pel 
(61:31) =— OxFFFF FFFF FFFF FFFF (2™ bytes) 
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64-bit Kernel Mode, User Space (xkuseg) 


In Kernel mode, when KX = | in the Status register and bits 63:62 of the virtual 
address are 00, the xkuseg virtual address space is selected; it covers the current 
24°_byte (1 TB) user address space. The virtual address is extended with the 
contents of the 8-bit ASID field to form a unique virtual address. 


This space is referenced via TLB. Whether the cache can be used or not is 
determined by the value of the C bit of the TLB entry of each page. 


If the ERL bit of the status register is 1, the user address area is a 2 GB area that 
cannot be cached without TLB mapping (i.e., the virtual addresses are used as 
physical addresses as is). However, this is a function used by the Vp4400 to 
process an ECC error in an exception handler. This function is defined to maintain 
the compatibility of the Vp4300 with the Vp4400 because the Vp4300 does not 
have an ECC and a parity function. 


64-bit Kernel Mode, Current Supervisor Space (xksseg) 


In Kernel mode, when KX = 1 in the Status register and bits 63:62 of the virtual 
address are 01, the xksseg virtual address space is selected; it covers the current 
supervisor virtual space. The virtual address is extended with the contents of the 
8-bit ASID field to form a unique virtual address. 


This space is referenced via TLB. Whether the cache can be used or not is 
determined by the value of the C bit of the TLB entry of each page. 


64-bit Kernel Mode, Physical Spaces (xkphys) 


In Kernel mode, when KX = | in the Status register and bits 63:62 of the virtual 
address are 10, one of the eight unmapped xkphys address spaces are selected, 
either cached or uncached. Bits 31:0 of the virtual address are used as they are as 
the physical address. Accesses with address bits 58:32 including | cause an 
address error. 


Use of the cache is indicated by the bits 61 through 59 of the virtual address. Table 
5-5 shows the eight address spaces and use of the corresponding cache. 
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Table 5-5 Use of Cache and xkphys Address Space 


Bits 61-59 Use of Cache Address 


0 Used 0x8000 0000 0000 0000 
through 

0x8000 0000 FFFF FFFF 

1 Used 0x8800 0000 0000 0000 
through 

0x8800 0000 FFFF FFFF 

2 Not used 0x9000 0000 0000 0000 
through 

0x9000 0000 FFFF FFFF 

3 Used 0x9800 0000 0000 0000 
through 

0x9800 0000 FFFF FFFF 

4 Used 0xA000 0000 0000 0000 
through 

0xA000 0000 FFFF FFFF 

5 Used 0xA800 0000 0000 0000 
through 

OxA800 0000 FFFF FFFF 

6 Used 0xB000 0000 0000 0000 
through 

0xB000 0000 FFFF FFFF 


7 Used 0xB800 0000 0000 0000 
through 
0xB800 0000 FFFF FFFF 


64-bit Kernel Mode, Kernel Space (xkseg) 


In Kernel mode, when KX = | in the Status register and bits 63:62 of the virtual 
address are 11 the address space is referred to as xkseg. The address space selected 
is one of the following: 


¢ Kernel virtual space, xkseg, the current kernel virtual space; the virtual 
address is extended with the contents of the 8-bit ASID field to form a 
unique virtual address 
This space is referenced via TLB. Whether the cache can be used or 
not is determined by the value of the C bit of the TLB entry of each 
page. 

¢ — one of the four 32-bit kernel compatibility spaces, as described in the 
next section. 
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64-bit Kernel Mode, Compatibility Spaces (ckseg1:0, cksseg, ckseg3) 


In Kernel mode, when KX = 1 in the Status register, bits 63:62 of the 64-bit virtual 
address are 11, and bits 61:32 of the virtual address are OxFFFF FFFF, bits 31:16 
of the virtual address in the 64-bit mode are 0x8000-OxFFFF, as shown in Figure 
5-6, select one of the following 512 MB compatibility spaces. 


cksegO. This space is an unmapped region, compatible with the kseg0 
space in 32-bit mode. The KO field of the Config register controls 
cacheability and coherency. 


ckseg1. This space is an unmapped and uncached region, compatible 
with the kseg/ space in 32-bit mode. 


cksseg. This space is the current supervisor virtual space, compatible 
with the ksseg space in 32-bit mode. 

This space is referenced via TLB. Whether the cache can be used or 
not is determined by the value of the C bit of the TLB entry of each 
page. 

ckseg3. This space is current supervisor virtual space, compatible 
with the kseg3 space in 32-bit mode. 

This space is referenced via TLB. Whether the cache can be used or 
not is determined by the value of the C bit of the TLB entry of each 


page. 
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5.3 System Control Coprocessor 


The System Control Coprocessor (CPO) is implemented as an integral part of the 
CPU, and supports memory management, address translation, exception handling, 
and other privileged operations. CPO contains the registers shown in Figure 5-8 
plus a 32-entry TLB. The sections that follow describe how the processor uses 

each of the TLB-related registers. 


Remark Each register is assigned a number called a register number. For 
details, refer to Chapter 1 General. For the relations among the CPO 
function, exception processing, and registers, refer to Chapter 6 
Exception Processing. 


Used with memory 


management system 


Used with exception 


processing 
eye? ines Context BadVAddr 
EntryHi 2 4 8* 
10* EntryLo1 
as 3* Random Count Compare 
1 ig ; 9* 1 1 * 
31 
Page Mask §'| = status Cause 
5* 12* 13* 
TLB Wired [| EPC WatchLo 
6* | 14" 18° 
PRid ff) WatchHi ff | XContext 
Oe ed 15* 19* 20* 
(“Safe” entries) 
Ref 4.4 Wi j 
¥ Bae ae Config :|/Parity Error | CacheErr 
gister (6). ‘ 26* : 
0 {127/255 16 27 
LLAddr TagLo TagHi ; ErrorEPC 
17* 28* 29* ! 30* 
* Register number 
Figure 5-8 CPO Registers and the TLB 
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5.3.1 Format of a TLB Entry 


Figure 5-9 shows the TLB entry formats for both 32- and 64-bit modes. Each field 
of an entry has a corresponding field in the EntryHi, EntryLo0, EntryLo1, or 


PageMask registers. 
32-bit Mode 
127 121 120 109 108 96 
0 MASK 0 
7 12 13 
95 77 76 75 7271 64 
VPN2 G| 0 ASID 
19 1 4 8 
63 58 57 38 37 35 3433 32 
0 PFN Cc p| vio 
6 20 Ca ee 
31 26 25 65 32 1 0 
0 PFN Cc |DIVi O 
20 3 111 
64-bit Mode 
217 216 205 204 
— 
191 190189 168 167 141 140139136135 
—— "cer 
4 
90 89 70 69 67666564 
0 PFN C |D/ VO 
38 20 ae 
63 26 25 65 32 1 O 
PEN _c _[olvio 
38 20 3 111 


Figure 5-9 TLB Entry Format 
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Mask : Page comparison mask. Determines the virtual page size of the corresponding entry. 


: Reserved for future use (RFU). Must be written as zeroes, and returns zeroes when 
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The formats of the EntryHi, EntryLo0, EntryLo1l, and PageMask registers are 
almost the same as the TLB entry. However, the G bit of TLB is undefined with 
the entry Hi register. 


PageMask Register 
31 25 24 13 12 0 


0 MASK 0 
7 12 13 


read. 
EntryHi Register 
34 13 12 8 7 0 
VPN2 0 ASID | 
19 5 8 
63 62 61 40 39 13.12 8 7 0 
R Fill VPN2 0 ASID 
2 22 27 5 8 


: Virtual page number divided by two (maps to two pages). 
: Address space ID field. An 8-bit field that lets multiple processes share the TLB; virtual 


addresses for each process can be shared. 


: Region. (00 — user, 01 — supervisor, 11 — Kernel) used to match vAddrg3__ go 


: RFU. Writing this data to this area is ignored. 0 is returned when this bit area read. 
: RFU. Must be written as zeroes, and returns zeroes when read. 


Figure 5-10 TLB Entry Registers (1/2) 
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EntryLo0 and EntryLo1 Registers 
EntryLoo 31 26 25 y y g 65 32 1 0 
32-bit 0 PFN C |DIVIG [ 
Mode 
6 20 3 11°61 
EntryLo1 31 26 25 65 32 1 0 
32-bit 0 PFN C |DIVIG ; 
sie 6 20 a 1 a 
EntryLoo 63 26 25 65 32 1 0 
64-bit 0 PFN C |DI|VIG [ 
mode 38 20 a ae a 
EntryLot 63 26 25 65 32 1 0 
64-bit 0 PFN C |DIVIG j 
M 
ode 30 20 a a a 
PFN _ : Page frame number; the high-order bits of the physical address. 
C : Specifies the TLB page attribute; refer to Table 5-6. 
D : Dirty. If this bit is set, the page is marked as dirty and, therefore, writable. This bit is 
actually a write-protect bit that software can use to prevent alteration of data. 
Vv : Valid. If this bit is set, it indicates that the TLB entry is valid; otherwise, a TLBL or 
TLBS miss occurs. 
G : Global. If this bit is set in both Entry LoO and Entry Lo1, then the processor ignores the 
ASID during TLB lookup. 
0 : RFU. Must be written as zeroes, and returns zeroes when read. 


Figure 5-10 TLB Entry Registers (2/2) 


Whether the cache is used when a page is referenced is specified by the page 
coherency attribute (C) bit of the TLB. To use the cache, specify “cache is used” 
or “cache is not used” by algorithm as a page attribute. Table 5-6 shows the page 
attributes selected by the C bit. 


Table 5-6 Cache Algorithm 


Value of C Bit Cache Algorithm 
0 Cache is used 
1 Cache is used 
2 Cache is not used 
3 Cache is used 
4 Cache is used 
5 Cache is used 
6 Cache is used 
7 Cache is used 
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5.4 CPO Registers 


The following sections describe the CPO registers that can be accessed through the 
memory management system and software (each register is followed by its 
register number in parentheses). 


5.4.1 Index Register (0) 


The Index register is a 32-bit, read/write register containing six bits to index an 
entry in the TLB. The most-significant bit of the register shows the success or 
failure of a TLB Probe (TLBP) instruction. 


The Index register also specifies the TLB entry affected by TLB Read (TLBR) or 
TLB Write Index (TLBWIJ) instructions. 


Although the Index register Index field is six bits wide, only the five least- 
significant bits (4:0) are used in TLB operations, since the Vp4300 TLB has 32 
entries. Bit 5 is readable and writable, but is ignored during TLB operations. 


The value of the index register on reset is undefined. Therefore, initialize the Index 
register in software. 


Index Register 


31 30 6 5 0 
P 0 Index 

1 25 6 
P : Probe success or failure. Set to 1 when the previous TLBProbe 


(TLBP) instruction was unsuccessful; set to 0 when successful. 
Index : Index to the TLB entry affected by the TLBRead and TLBWrite 
instructions 
0 : RFU. Must be written as zeroes, and returns zeroes when read. 


Figure 5-11 Index Register 
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5.4.2 Random Register (1) 


The Random register is a read-only register of which six bits are used for referring 
to the TLB entry. Although the Random field is six bits wide, only the five low- 
order bits (4:0) are used in TLB operations, since the Vp4300 TLB has 32 entries. 

Bit 5 is readable and writable by software, but is ignored during TLB operations. 


This register decrements as each instruction executes, and its values range 
between an upper and a lower bound, as follows: 


* A lower bound is indicated by the contents of the Wired register. 
¢ An upper bound limit is 31. 


The Random register specifies the entry in the TLB that is affected by the TLB 
Write Random instruction. The register does not need to be read for this purpose; 
however, the register is readable to verify proper operation of the processor. 


To simplify testing, the Random register is set to the value of the upper bound 
upon Cold Reset. This register is also set to the upper bound when the Wired 
register is written. 


Figure 5-12 shows the format of the Random register. 


Random Register 


31 65 0 
0 Random 
26 6 


Random: TLB Random index. 
0 : RFU. Must be written as zeroes, and returns zeroes when read. 


Figure 5-12 Random Register 
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5.4.3 EntryHi (10), EntryLo0 (2), EntryLol (3), and PageMask (5) Registers 


These registers are used to rewrite the TLB or to check coincidence of a TLB entry 
when addresses are converted. If the TLB exception occurs, information on the 
address that has caused the exception is loaded to these registers. Figure 5-10 
shows the formats of the EntryHi, EntryLo0, EntryLo1, and PageMask registers. 


The values of these registers on reset are undefined. Therefore, initialize the 
registers by software. 


EntryHi Register 


The EntryHi register is a read/write register and is used to access the high-order 
bits of the internal TLB. 


The EntryHi register retains the contents of the high-order bits of a TLB entry 
when a TLB read or write operation is executed. If a TLB miss, TLB invalid, or 
TLB modification exception occurs, the virtual page number (VPN2) of the 
virtual address that has caused the exception and ASID are set to the EntryHi 
register. For the details of the TLB exception, refer to Chapter 6 Exception 
Processing. 


ASID is used to write or read the ASID area of the TLB entry. When an address 
is converted, it is verified against the ASID of the TLB entry as the ASID of the 
virtual address. 


To access this register, use the TLBP, TLBWR, TLBWI, or TLBR instruction. 


EntryLo0 and EntryLol Registers 


EntryLo consists of two registers: EntryLoO for even virtual pages and EntryLol 
for odd virtual pages. EntryLo0 and Lo/ registers are read/write registers and are 
used to access the low-order bits of the internal TLB. When a TLB read/write 
operation is executed, EntryLo0 and Lo/ access the contents of the low-order bits 
of the TLB entry on an even and odd pages. 
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PageMask Register 


Page Size 


The PageMask register is a read/write register used for reading from or writing to 
the TLB; it holds a comparison mask that sets the page size for each TLB entry, 
as shown in Table 5-7. There are seven page sizes selectable. TLB read and write 
operations use this register as either a destination or a source; when virtual 
addresses are presented for translation into physical address, the bits 24:13 which 
are used in the comparison are masked. When the Mask field is not one of the 
values shown in Table 5-7, the operation of the TLB is undefined. 


Table 5-7 Mask Field Values for Page Sizes 
Bit 


4 KB 
16 KB 


64 KB 


256 KB 
1 MB 


4 MB 


16 MB 
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5.4.4 Wired Register (6) 


The Wired register is a read/write register that specifies the boundary between the 
wired and random entries of the TLB as shown in Figure 5-13. Wired entries are 
fixed, nonreplaceable entries, which cannot be overwritten by a TLBWR (TLB 
Write Random) operation. They can, however, be overwritten by a TLBWI (TLB 
Write Indexed) instruction. Random entries can be overwritten. 


TLB 
31 
Range of Random entries 
Value of 
Wired 
Register 
Range of Wired entries 
0 


Figure 5-13 Wired Register Boundary 


Although the Wired field is six bits wide, only the five low-order bits are used in 
TLB operations, since the Vp4300 TLB has 32 entries. Bit 5 is readable and 
writable by software, but is ignored during TLB operations. 


The Wired register is set to 0 upon Cold Reset. Writing this register also sets the 
Random register to the value of its upper bound of 31 (Refer to 5.4.2 Random 
Register (1)). Figure 5-14 shows the format of the Wired register. 


Wired Register 


31 65 0 
0 Wired | 
26 6 


Wired : TLB Wired boundary. 
0 : RFU. Must be written as zeroes, and returns zeroes when read. 


Figure 5-14 Wired Register 
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5.4.5 Processor Revision Identifier (PRId) Register (15) 


The 32-bit, read-only Processor Revision Identifier (PRId) register contains 
information identifying the implementation and revision level of the CPU and 
CPO. Figure 5-15 shows the format of the PR/d register. 


PRid Register 


31 1615 87 0 
0 | Imp | Rev 


16 8 8 


Imp : Processor ID number (Ox0B for the VR4300 series™) 
Rev : Processor revision number 
0 : RFU. Must be written as zeroes, and returns zeroes when read. 


Figure 5-15 Processor Revision Identifier Register 


The processor revision number is a value in the format of yx. y is the major 
revision number contained in bits 7:4, and x is the minor revision number 
contained in bits 3:0. 


The processor revision number identifies revision of the chip. However, revision 
of the chip is not always reflected on the PRID register. Conversely, a change in 
the revision number does not always reflect on the actual change of the chip. 
Therefore, develop your program so that it does not depend on the processor 
revision number area. 


5.4.6 Config Register (16) 


This register displays or sets various processor statuses of the Vp4300. 


Although consideration is given to maintain compatibility of this register with the 
Config register of the Vp4400, some pins of this register are fixed to 0. 


The EP and BE area are initialized on cold reset. These areas can be read or 
written by software. The default values of these areas are as follows: 


EP: 0000 
BE: 1 


The CU bit and KO area can be read or written in software. However, because 
these bit and area are not initialized, the user must set the default values to them 
after reset. 
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The values of the EP and BE areas can be changed only when initialization is 
executed in the non-cache area immediately after cold reset and before a store 


instruction is executed. The operation is not guaranteed if the values of these areas 
are changed at any other time. Figure 5-16 shows the format of the Config 


register. 


31 30 2827 24 23 


16 15 14 4 3 2 0 


EC 


152 


00000110 


3 4 8 


11001000110 


1 11 1 3 


: Operating frequency ratio (read-only). The value displayed corresponds to the frequency 


ratio set by the DivMode pins on power application. 
(For details of DivMode pin setting, refer to Table 2-2 Clock/Control Interface Signals.) 


uPD30200-80 (VR4305) 

110 — 1:1 (MasterClock: PClock) 
111 — RFU 

000 = 1:2 

001 > 1:3 

Others — RFU 

uPD30200-100 (VR4300) 

110 — RFU 

111 — 1:1.5 (MasterClock: PClock) 
000 = 1:2 

001 = 1:3 

Others — RFU 

uPD30200-133 (VR4300) 

110 — 1:4 (MasterClock: PClock) 
111 — RFU 

000 = 1:2 

001 > 1:3 

Others — RFU 

uPD30210-133 (VR4310) 

010 — 1:5 (MasterClock: PClock) 
011 > 1:6 

100 — RFU 

101 > 1:3 

110 — 1:4 

111 — RFU 

000 > 1:2 

001 > 1:3 


Figure 5-16 


Config Register (1/2) 
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uPD30210-167 (VR4310) 

010 — 1:5 (MasterClock: PClock) 
011 > 1:6 

100 — 1:2.5 

101 > 1:3 

110 — 1:4 

111 — RFU 

000 — 1:2 

001 > 1:3 


: Sets transfer data pattern (single/block write request). 


0 — D (default on cold reset) 
6 — DxxDxx: 2 doublewords/6 cycles 
Others — RFU 


: Sets BigEndianMem (endianness). 


0 — Little endian 
1 — Big endian (default on cold reset) 


: RFU. However, can be read or written by software. 
: Sets coherency algorithm of ksegO (refer to Table 5-6 Cache Algorithm). 


010 — Cache is not used 
Others — Cache is used 


: Returns 1 when read. 
: Returns 0 when read. 


Caution If the BE bit of this register is changed by using the MTCO instruction, insert two 


or more NOP instructions or an instruction other than the load/store instruction in 
between the MTCO and load/store instructions. 


Figure 5-16 Config Register (2/2) 
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5.4.7 Load Linked Address (LLAddr) Register (17) 


The read/write Load Linked Address (LLAddr) register contains the physical 
address read by the most recent Load Linked instruction. This register is for 
diagnostic purposes only. 


Figure 5-17 shows the format of the LLAddr register. The Paddr area in the figure 
shows the value with the high-order four bits of the physical address PA(31:4) 
read on execution of the LL instruction zero-extended. 


The contents of the LLAddr register are undefined on reset. 


LLAddr Register 


31 0 
PAddr | 
32 


PAddr : Stores the bits 31 through 4 of the physical address read by the last 
LL instruction to bits 27 through 0, and 0 to bits 31 through 28. 


Figure 5-17 LLAddr Register 


5.4.8 Cache Tag Registers [TagLo (28) and TagHi (29)] 


The TagLo and TagHi registers are 32-bit read/write registers that hold the 
primary cache tag for cache initialization, cache diagnostics, or cache error 
processing. The Tag registers are written by the CACHE and MTCO instructions. 


Figure 5-18 shows the format of these registers. 


The contents of these registers are undefined on reset. 
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31 28 27 8 7 6 5 0 
TagLo | 0 PTagLo PState 0 
4 20 2 
31 0 
TagHi 0 | 
32 


PTagLo : Physical address bits 31:12 
PState : Specifies the primary cache state 


Data cache 
11 = Valid 
00 = Invalid 

Instruction cache 
10 = Valid 
00 = Invalid 

Others = Undefined 

0 : RFU. Must be written as zeroes; returns zeroes when read 


Cautions 1. If 10 is written to PState by using the CACHE 
(Index_Store_Tag) instruction, the CACHE is Clean. 
However, 11 is read when the PState value is read by using the 
CACHE (Index_Load_Tag) instruction. 

2. If01is written to PState by using the CACHE 
(Index_Store_Tag) instruction, the CACHE operation is not 
guaranteed. 

3. If11is written to PState by using the CACHE 
(Index_Store_Tag), the CACHE is Dirty. 


Figure 5-18 TagLo and TagHi Register 


5.4.9 Virtual-to-Physical Address Translation Process 


During virtual-to-physical address translation, the CPU compares the 

8-bit ASID (if the Global bit, G, is not set) of the virtual address to the ASID of 
the TLB entry to see if there is a match. One of the following comparisons are 
also made: 


¢ In 32-bit mode, the high-order bits* of the virtual address are 
compared to the contents of the TLB entry, VPN2 (virtual page 
number divided by two). 


¢ In 64-bit mode, the high-order bits* of the virtual address are 
compared to the contents of the TLB entry, VPN2 (virtual page 
number divided by two). 
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If a TLB entry matches, the physical address and access control bits (C, D, and V) 
are retrieved from the matching TLB entry. While the V bit of the entry must be 
set for a valid translation to take place, it is not involved in the determination of a 
matching TLB entry. 


Figure 5-19 illustrates the TLB address translation process. 


* The number of bits differs depending on the page size. 
Here are examples where the page size is 16 MB and 4 KB: 
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Page Size 
Mode 16 MB 4 KB 
32-bit mode A (31:25) A (31:13) 
64-bit mode A63, A62, and A (39:25) A63, A62, and A (39:13) 


Memory Management System 


Virtual Address (Input) 


VPN 
and 
ASID 


Lega 
Address? 


Address 
Error 


Exception Exception 


Yes 


TLB 
Mod 


Exception 


TLB TLB XTLB 
Invalid Miss Miss 


Exception Exception Exception 


Access 
Main Access 
Memory Cache 


Physical Address (Output) 
Figure 5-19 TLB Address Translation 
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TLB Misses 


If there is no TLB entry that matches the virtual address, a TLB miss exception 
occurs.* If the access control bits (D and V) indicate that the access is not valid, 
a TLB Modification exception or TLB Invalid exception occurs. If the C bits 
equal 010, the physical address that is retrieved accesses main memory, bypassing 
the cache. 


* TLB miss exceptions are described in Chapter 6 Exception Processing. 


TLB Instructions 


The following instructions are used to control the TLB. 


TLBP (Translation Lookaside Buffer Probe) 


Loads a TLB number that matches the contents of the EntryHi register to the Index 
register. If the TLB entry does not match, the most significant bit of the Index 
register is set. 


TLBR (Translation Lookaside Buffer Read) 


Writes the contents of the TLB entry indicated by the /ndex register to the 
EntryHi, EntryLo0, EntryLol, and PageMask registers. 


TLBWI (Translation Lookaside Buffer Write Index) 


Writes the contents of the EntryHi, EntryLo0, EntryLo1, and PageMask registers 
to the TLB entry indicated by the contents of the Index register. 


TLBWR (Translation Lookaside Buffer Write Random) 


Writes the contents of the EntryHi, EntryLo0, EntryLo1, and PageMask registers 
to the TLB entry indicated by the contents of the Random register. 
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This chapter describes the exception processing and the hardware used for the 
exception processing. For the FPU exception, refer to Chapter 8 Floating-Point 
Exceptions. 
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The processor receives exceptions from a number of sources, including translation 
lookaside buffer (TLB) misses, arithmetic overflows, I/O interrupts, and system 
calls. When the CPU detects an exception, the normal sequence of instruction 
execution is suspended and the processor enters Kernel mode (refer to Chapter 5 
Memory Management System for a description of system operating modes). 
The processor then disables interrupts and forces execution of a software 
exception process (called an exception handler) located at a fixed address. The 
handler saves the context of the processor, including the contents of the program 
counter, the current operating mode (User or Supervisor), and the status of the 
interrupts (enabled or disabled). This context is saved so it can be restored when 
the exception processing has been performed. 


When an exception occurs, the CPU loads the Exception Program Counter (EPC) 
register with a location where execution can restart after the exception processing 
has been performed. The restart location in the EPC register is the address of the 
instruction that caused the exception. If the instruction was executing in a branch 
delay slot, the CPU loads the EPC register to the address of the branch instruction 
immediately preceding the branch delay slot. 


For the exception processing, the following modes can be set. 
e Interrupt enable (/E) 
¢ Base operating mode (User, Supervisor, or Kernel) 


e Exception level (normal or exception, as indicated by the EXL bit in 
the Status register) 


¢ — Error level (normal or error, as indicated by the ERL bit in the Status 
register). 


Each setting condition is described below. 


Interrupt Enable 


Interrupts are enabled if the following conditions are satisfied. 
e JE (interrupt enable bit) = 1 
¢ EXL bit = 0, ERL bit = 0 


e Bit of corresponding IM area in status register = | 


Base Operating Mode 


The operating mode that is the basis when the exception level is normal (0) is 
specified by the KSU area of the Status register. 
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Exception/Error Level 
The Kernel mode is set when either of the EXL or ERL bit is set to 1. 


When execution returns from exception processing, the exception level is reset to 
normal (0) (for details, refer to ERET Instruction of Chapter 16 CPU 
Instruction Set Details). 


In addition to the above, registers that hold information on addresses, causes, and 
statuses during exception processing are provided. For details, refer to 6.3 
Exception Processing Registers. For details of the exception processing, refer to 
6.4 Exception Details. 


6.2 Precision of Exceptions 


VR4300 exceptions are logically precise; the instruction that causes an exception 
and all those that follow it are aborted and can be re-executed after servicing the 
exception. When succeeding instructions are killed, exceptions associated with 
those instructions are also killed. Exceptions are not taken in the order detected, 
but in instruction fetch order. 


6.3 Exception Processing Registers 


This section describes the CPO registers that are used in exception processing. 
Table 6-1 lists these registers, along with their number—each register has a unique 
identification number that is referred to as its register number. The remaining 
CPO registers are used in memory management, as described in Chapter 5 
Memory Management System. 


Software examines the CPO registers to determine the cause of the exception and 
the state of the CPU at the time the exception occurred. The registers in Table 6- 
1 are used in exception processing, and are described in the sections that follow. 
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Table 6-1 CPO Exception Processing Registers 


Register Name Reg. No. 

Context 4 
BadV Addr (Bad Virtual Address) 

Count 9 
Compare 11 
Status 12 
Cause 13 
EPC (Exception Program Counter) 14 
WatchLo 18 
WatchHi 19 
XContext 20 
PErr* 26 
CacheErr (Cache Error)* 27 
ErrorEPC (Error Exception Program Counter) 30 


* — This register is defined to maintain compatibility between the Vp4300 and 
Vp4200, and is not used with the hardware of the Vp4300. 


Hazard of CP0 


With the General Purpose registers of the CPU, when the result of an operation is 
to be used by the next instruction, the hardware generates a stall and waits until 
the result can be used. However, the CPO register and TLB do not generate a stall. 
If a value is stored to the CPO register, that value may not be used by the 
immediately following instruction because the value is stored in the register 
several cycles later. When designing a program, therefore, you must take this into 
consideration when setting values to the CPO register and TLB (for details, refer 


to Chapter 19 Coprocessor 0 Hazards). 
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6.3.1 Context Register (4) 


The Context register is a read/write register containing the pointer to an entry in 
the page table entry (PTE) array on memory; this array is an operating system data 
structure that stores virtual-to-physical address translations. When there is a TLB 
miss, the operating system loads the TLB with the missing translation from the 
PTE array. The Context register is used by the TLB Miss exception handler to 
load the TLB entry. 


The Context register duplicates some of the information provided in the 
BadVAddar register, but the information is arranged in a form that is more useful 
for a software TLB exception handler. 


Figure 6-1 shows the format of the Context register. 


Context Register 


31 23 22 4 3 0 
32-bit PTEBase BadVPN2 0 
Mode 
9 19 4 
63 23 22 43 0 
64-bit PTEBase BadVPN2 0 
Mode 
AY 19 4 


PTEBase : Base address of page table entry 
BadVPN2 : Page number of virtual address whose translation is invalid divided by 2 


0 


: RFU. Must be written zeroes; returns zeroes when read 


Figure 6-1 Context Register 


The Context register bit field is described below. 


BadVPN2 field is written by hardware on a TLB miss. It contains the virtual page 
number (VPN2), divided by 2, of the most recent virtual address that did not have 
a valid translation. 


PTEBase area can be read or written and is controlled by the operating system. It 
is used only by the software as a pointer to the current PTE array on the memory. 


The 19-bit BadVPNZ2 field contains bits 31:13 of the virtual address that caused 
the TLB miss; bit 12 is excluded because a single TLB entry maps to an even-odd 
address pair. For a 4 KB page size, this format can be used as the pointer to refer 
to the pair-table of 8-byte PTEs. For 16 KB page or larger, shifting and masking 
this value produces the correct PTE reference address. 
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6.3.2 BadVAddr Register (8) 


The Bad Virtual Address (BadVAddr) register is a read-only register and holds a 
virtual address that was translated but became invalid last, or a virtual address at 
which an addressing error occurred. Figure 6-2 shows the format of the BadVAddr 
register. 


Caution This register does not hold information even when a bus error 
exception occurs because it is not an address error exception. 


BadVAddr Register 


31 0 
32-bit Bad Virtual Address | 
Mode 

63 32 0 
64-bit Bad Virtual Address 
Mode 

64 


BadVAddr : virtual address at which an address error occurred last or which failed 
in address translation 


Figure 6-2. BadVAddr Register 


6.3.3, Count Register (9) 


The read/write Count register acts as a timer, incrementing at a constant rate—half 
the PClock speed—whether or not instructions are being executed. This register 
is a free-running type. When the register reaches all ones, it rolls over to zero and 
continues counting. This register can be used for diagnostic purposes, system 
initialization or synchronization between the processes. 


Figure 6-3 shows the format of the Count register. 


Count Register 


31 0 
Count | 
32 


Count : latest count value (incremented at frequency half PClock) 


Figure 6-3 Count Register 
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6.3.4 Compare Register (11) 


The Compare register is used to generate a timer interrupt; it maintains a stable 
value that does not change on its own. When the value of the Compare register 
equals the value of the Count register (refer to 6.3.3), interrupt bit JP(7) in the 
Cause register is set. This causes an interrupt in the DF stage as soon as the 
interrupt is enabled. Writing a value to the Compare register, as a side effect, 
clears the timer interrupt. 


For diagnostic purposes, the Compare register is a read/write register. However, 
it is usually used as a write register. Figure 6-4 shows the format of the Compare 


register. 
Compare Register 
31 0 
Compare 


32 


Compare : value to be compared with count register 


Figure 6-4 Compare Register 


6.3.5 Status Register (12) 


The Status register (SR) is a read/write register that contains the operating mode, 
interrupt enabling, and the diagnostic states of the processor. Figure 6-5 shows 
the format of the entire register. 
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31 


Status Register 
28 2726 25 24 16 15 87 6543 2 1 #0 


CU 


(CU3:CU0) 


4 


CU 


RP 


FR 
RE 


DS 
IM(7:0) : 


KX 


SX 


RP| FR] RE DS IM(7:0) KX|SX/UX|KSU |ERLJEXL| IE 
11 1 9 8 1 1 1 2 1 1 =1 


: Controls the usability of each of the four coprocessor unit numbers. 


(1 — usable, 0 — unusable) 
CPO is always usable when in Kernel mode, regardless of the setting of the CUO bit. 
CP2 and CP3 are reserved for future expansion. 


: Enables low-power operation by reducing the internal clock frequency and the system 


interface clock frequency to one-quarter speed. 


* 


(0 — normal, 1 — low power mode) (For details, refer to 15.1.2 Low Power Mode.) 


: Enables additional floating-point registers 


(0 — 16 registers, 1 — 32 registers) 


: Reverse-Endian bit, enables reverse of system endianness in User mode. 


(0 — disabled, 1 — reversed) 


: Diagnostic Status field (see Figure 6-6, for details). 


Interrupt Mask field, enables external, internal, coprocessors or software interrupts. 
(0 — disabled, 1 — enabled) 

IM(7) : Mask bit for timer interrupt 

IM(6:2) : Mask bits for external interrupts Int[4:0], or external write requests 
IM(1:0) : Mask bits for software interrupts and IP(1:0) of the Cause register 


: Enables 64-bit addressing in Kernel mode. When this bit is set, XTLB miss exception is 


generated on TLB misses in Kernel mode addresses space. 
(0 — 32-bit, 1 — 64-bit) 
64-bit operation is always valid in Kernel mode. 


: Enables 64-bit addressing and operations in Supervisor mode. When this bit is set, XTLB 


miss exception is generated on TLB misses in Supervisor mode addresses space. 
(0 — 32-bit, 1 + 64-bit) 


: Enables 64-bit addressing and operations in User mode. When this bit is set, XTLB miss 


exception is generated on TLB misses in User mode addresses space. 
(0 — 32-bit, 1 + 64-bit) 


: Specifies and indicates mode bits 


(10 — User, 01 — Supervisor, 00 — Kernel) 


: Specifies and indicates error level 


(0 — normal, 1 — error) 


: Specifies and indicates exception level 


(0 — normal, 1 — exception) 


: Specifies and indicates global interrupt enable 


(0 — disable interrupts, 1 — enable interrupts) 


* The low power mode is supported only in the 100 MHz model of the VR4300 and theVR4305. 
Fix the RP bit of the 133 MHz model of the Vp4300 and the Vp4310 to 0. 


166 


Figure 6-5 Status Register 
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ITS 


BEV 


TS 


SR 


CH 


CE, DE: 


Exception Processing 


Figure 6-6 shows the format of the self-diagnostic status (DS) area. All the bits in 
the DS area, except the TS bit, can be read or written. 


Self-Diagnostic Status Field 
24 23 22 21 20 19 18 17 16 


ITS 0 BEV TS SR 0 CH CE DE 


1 1 1 1 1 1 1 1 1 


: Enables Instruction Trace Support. 


For details, refer to 9.3.5 Instruction Trace Support. 


: Controls the location of TLB miss and general purpose exception vectors. 


0 — normal 
1 — bootstrap 


: Indicates TLB shutdown has occurred (read-only); used to avoid damage to the TLB if 


more than one TLB entry matches a single virtual address. 

0 — does not occur 

1 — occur 

After TLB shutdown, the processor must be reset to restart. TLB shutdown can occur 
even when a TLB entry with which the virtual address has matched is set to be invalid 
(V bit of the entry is cleared). 


: 0 — Indicates a Soft Reset or NMI has not occurred. 


1 — Indicates a Soft Reset or NMI has occurred. 


: CPO condition bit. 


0 — false 

1 — true 

Read/write access by software only; not accessible by hardware. 

These bits are defined to maintain compatibility with the VR4200, and is not used by the 
hardware of the VR4300. 


: RFU. Must be written as zeroes, and returns zeroes when read. 


Figure 6-6 Self-Diagnostic Status Field 
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Fields of the Status register set the modes and access states described in the 
sections that follow. 


Instruction Trace Support 


The Vp4300 can output the physical address at the branch destination from 
SysAD(3 1:0) if the instruction address is internally changed by the branch or jump 
instruction, or occurrence of an exception. To use this function, set the ITS bit to 
1. 


An instruction cache miss is forcibly generated in the following cases to output the 
physical address at the branch destination. 


e — If the branch condition is satisfied when a branch instruction is 
executed 


¢ If the value of PC is changed by a jump instruction or occurrence of 
an exception 


If an instruction cache miss is generated, SysAD(31:0) issues a processor block 
read request, which allows an external device to learn a change of the address. 


Return response data in response to the processor block read request in the same 
manner as to the ordinary request. The address to be output is not the value of the 
PC (virtual address), but a physical address. 


Interrupt Enable 


Interrupts are enabled when all of the following conditions are satisfied: 
°* JE=1 
° EXL=0 
¢ ERL=0 
e When corresponding bit of IM is set to 1 


168 User’s Manual U10504EJ7VOUMOO 


Exception Processing 


Operating Modes 


The following Status register bit settings are required for User, Kernel, and 
Supervisor modes. 
e The processor is in User mode when KSU = 10, EXL = 0, and ERL = 


0. 

¢ The processor is in Supervisor mode when KSU = 01, EXL = 0, and 
ERL = 0. 

e The processor is in Kernel mode when KSU = 00, or EXL = 1, or ERL 
=]: 


32- and 64-bit Modes 


The following Status register bit settings select 32- or 64-bit operation for User, 

Kernel, and Supervisor operating modes. Enabling 64-bit operation permits the 

execution of 64-bit opcodes and translation of 64-bit addresses. 64-bit operation 
for User, Kernel and Supervisor modes can be set independently. 


¢ 64-bit addressing for Kernel mode is enabled when KX = 1. 
64-bit operations are always valid in Kernel mode. 


¢ 64-bit addressing and operations are enabled for Supervisor mode 
when SX = 1. 
¢ 64-bit addressing and operations are enabled for User mode when UX 
= 1. 
Kernel Address Space Accesses 


Access to the kernel address space is allowed when the processor is in Kernel 


mode. 


Supervisor Address Space Accesses 


Access to the supervisor address space is allowed when the processor is in Kernel 


or Supervisor mode. 


User Address Space Accesses 


Access to the user address space is allowed in any of the three operating modes. 
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Status on Reset 


The contents of the Status register on reset are undefined except for the following 
bits: 

« TS and RP =0 

¢ ERL and BEV= 1 


¢ SR=0 on cold reset; SR = | on soft reset or NMI interrupt 


Inverting Endian 
The Vp4300 is set to big endian at reset. After that, the endian setting can changed 
by using the BE bit of the Config register. 
¢ When RE bit = 1 


The endian setting in the Kernel and supervisor modes is specified by 
the BE bit of the Config register. The endian setting in the User mode 
is opposite to the specified endian setting. 


¢ When RE bit = 0 


The endian setting in the Kernel, Supervisor mode, and User mode is 
specified by the BE bit of the Config register. 
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6.3.6 Cause Register (13) 


The Cause register is a 32-bit read/write register and holds the cause of the 
exception that has occurred last. The 5 bits in the exception code area of this 
register indicate the cause of the exception (refer to Table 6-2). The remaining 
areas hold detailed information on a specific exception. All the bits, except IP1 
and IPO, are read-only. The IP1 and IPO bits are used to generate the software 
interrupt. Figure 6-7 shows the format of the Cause register, and Table 6-2 
describes the exception code area. 


Cause Register 


31 30 29 28 27 16 15 87 6 21 0 
BD| 0| CE 0 ? IP(7:0) Ol eager e 
1 1 2 12 8 1 5 2 
BD : Indicates whether the last exception occurred has been executed in a branch delay 
slot. 
1 — delay slot 
0 — normal 
CE : Coprocessor unit number referenced when a Coprocessor Unusable exception has 
occurred. If this exception does not occur, undefined. 
IP(7:0) — : Indicates an interrupt is pending. 


1 — interrupt pending 
0 — no interrupt 
IP(7) : Timer interrupt 
IP(6:2) : External normal interrupts. Controlled by Int[4:0], or external write 
requests 
IP(1:0) : Software interrupts. Only these bits can cause interrupt exception when 
they are set to 1 by software. 
ExcCode : Exception code field (refer to Table 6-2 for details.) 
0 : RFU. Must be written as zeroes, and returns zeroes when read. 


Figure 6-7 Cause Register 
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Table 6-2 Cause Register ExcCode Field 


Exception ; ot 
Code Value Mnemonic Description 

0 Int Interrupt 

1 Mod TLB Modification exception 

2, TLBL TLB Miss exception (load or instruction fetch) 
3 TLBS TLB Miss exception (store) 

4 AdEL Address Error exception (load or instruction fetch) 
5 AdES Address Error exception (store) 

6 IBE Bus Error exception (instruction fetch) 

7 DBE Bus Error exception (data reference: load or store) 
8 Sys Syscall exception 

9 Bp Breakpoint exception 

10 RI Reserved Instruction exception 

11 CpU Coprocessor Unusable exception 

12 Ov Arithmetic Overflow exception 

13 Tr Trap exception 

14 - RFU 

15 FPE Floating-Point exception 

16-22 - RFU 
23 WATCH Watch exception 
24-31 - RFU 
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The Vp4300 has eight interrupt requests: IP7 through IPO. These interrupt 
requests are used for the following purposes. 
IP7 


Indicates whether a timer interrupt request has been issued. This interrupt request 
is set when the contents of the Count register have become equal to those of the 
compare register. 


IP6 through IP2 


IP6 through IP2 reflect the logical sum of the two internal registers of the Vp 4300. 
One is the register that latches the status of an interrupt request pin in each cycle, 
and the other is a register to which data is written by the external write request of 
the system interface. 


IP1 and IPO 


IP1 and IPO set or clear the software interrupt request by manipulating each bit. 


For details, refer to Chapter 14 Interrupts. 


The floating-point exception uses the exception code contained in the floating-point 
control/status register (refer to Chapter 8 Floating-Point Exceptions). 
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6.3.7 Exception Program Counter (EPC) Register (14) 


The Exception Program Counter (EPC) is a read/write register that contains the 
address at which processing resumes after an exception has been serviced. 


The EPC register contains either: 
e the virtual address of the instruction that was the direct cause of the 
exception, or 


e the virtual address of the immediately preceding branch or jump 
instruction (when the instruction that was the direct cause of the 
exception is in a branch delay slot, and the Branch Delay bit in the 
Cause register is set). 


The EXL bit in the Status register is set to | to keep the processor from overwriting 
the address of the exception-causing instruction contained in the EPC register in 
the event of another exception. 


Figure 6-8 shows the format of the EPC register. 


EPC Register 


31 0 
32-bit 
Mode aay 
32 
63 0 
64-bit EPC 
Mode 
64 


EPC : Address from which program execution is resumed after an exception 
processing 


Figure 6-8 EPC Register 
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6.3.8 WatchLo (18) and WatchHi (19) Registers 


The Vp4300 processor provides a debugging feature to detect request of 
references to a selected physical address; load and store operations cause a Watch 
exception. Figure 6-9 shows the format of the WatchLo and WatchHi registers. 


Initialize the values of these registers in software since these values are undefined 


on reset. 
34 WatchLo Register 3 2 «+ 0 
PAddrO 0|R | W 
29 1 1 1 
WatchHi Register 

31 4 3 0 
0 PAddr1 | 
28 4 


PAddr1 : Bits 35:32 of a physical address. 
Because the most significant bit of a physical address handled by the 
VR4300 is bit 31, the value in this area is invalid. 
This area is provided to maintain software compatibility of the 
VR4300 with the VR4400 and VR4200, and all the 4 bits of this area 
can be read. 

PAddrO : Bits 31:3 of the physical address 


R : Exception occurs when load instruction is executed if set to 1. 
W : Exception occurs when store instruction is executed if set to 1. 
0 : RFU. Must be written as zeroes, and returns zeroes when read. 


Figure 6-9 WatchLo and WatchHi Registers 
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6.3.9 XContext Register (20) 
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The XContext register is a read/write register and indicates one entry of the page 
table entry array (PTE) on the memory. The PTE array is the data structure of the 
operating system and preserves a conversion table that translates virtual addresses 
into physical addresses. If a TLB miss occurs, the operating system loads the data 
that has caused the miss from the PTE to the TLB, and a remedial action is 
executed by the software. 


The XContext register is used by the XTLB miss exception handler that loads a 
TLB entry in the 64-bit addressing mode. 


Although this register contains several pieces of information that overlap with 
those of the BadVAdadr register, it is in the format easy to be used by the XTLB 
exception handler. 


This register is used by the operating system only. The PTEBase area of this 
register is set as necessary. 


Figure 6-10 shows the format of the XContext register. 


XContext Register 


63 33 32 3130 4 3 0 
PTEBase | R | BadVPN2 | 0 
31 2 27 4 
PTEBase : Base address of page table entry 
R : Space identifier (bits 63 and 62 of virtual address) 
00 — User 
01 — Supervisor 
11 — Kernel 
BadVPN2 : Virtual address whose translation is invalid (bits 39:13) 
0 : Must be written as zeroes, and returns zeroes when read. 


Figure 6-10 XContext Register 


Each bit area of the XContext register is described next. 
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BadVPN2 Area 


The BadVPN2 area is written by the hardware in case of a TLB miss. 


R Area 


The R area is written by the hardware in case of a TLB miss. 


PTEBase Area 


The PTEBase area is a read/write area and is used by the operating system. 


The 27-bit Bad VPN2 area holds the values of the bits 39:13 of the virtual address that has 
caused a TLB miss. Because a TLB entry consists of a pair of an even page and an odd 
page, it does not include bit 12. This register can be used as a pointer that references an 8- 
byte PTE pair table as it is where the page size is 4 KB. With the page size of 16 KB or 
more, an appropriate PTE reference address can be generated by shifting or masking the 
value of this register. 
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6.3.10 Parity Error (PErr) Register (26) 


The Parity Error register is a read/write register. This register is defined to 
maintain the software compatibility of the Vp4300 with the Vp4200. Because the 
VR4300 does not have a parity, this register is not used by the hardware. 


Figure 6-11 shows the format of the Parity Error register. 


PErr Register 


31 8 7 0 
0 | Diagnostic 
24 8 
Diagnostic : 8-bit self-diagnosis area 
0 : RFU. Must be written as zeroes, and returns zeroes when 


read. 


Figure 6-11 PErr Register 


6.3.11 Cache Error (CacheErr) Register (27) 


The Cache Error register is a read-only register. This register is defined to 
maintain the compatibility of the Vp4300 with the Vp4200. Because the Vp4300 
does not generate a cache error, this register is not used by the hardware. 


Figure 6-12 shows the format of the Cache Error register. 


CacheErr Register 


31 
ee ee 


32 


0 : RFU. Must be written as zeroes, and returns zeroes when read. 


Figure 6-12 CacheErr Register 
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6.3.12 Error Exception Program Counter (Error EPC) Register (30) 


The ErrorEPC register is similar to the EPC register. It is also used to store the 
program counter (PC) on Cold Reset, Soft Reset, and nonmaskable interrupt 
(NMI) exceptions. 


The read/write ErrorEPC register contains the virtual address at which instruction 
processing can resume after servicing an error. This address can be: 


e the virtual address of the instruction that caused the exception 


e the virtual address of the immediately preceding branch or jump 
instruction, when the instruction which is the cause of the error 
exception is in a branch delay slot. 


There is no branch delay slot indication for the ErrorEPC register. 


Figure 6-13 shows the format of the ErrorEPC register. 


ErrorEPC Register 


31 0 
32-bit 
Mode ErrorEPC 
32 
63 0 
64-bit ErrorEP 
Mode 7 e 
64 


ErrorEPC : Indicates the program counter on cold reset or soft reset, or in case of 
the NMI exception. 


Figure 6-13 ErrorEPC Register 
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6.4 Exception Details 


This section describes the processor exceptions (cause, processing, manipulation). 


6.4.1 Exception Types 
This section gives sample exception handler operations for the following 
exception types: 
¢ Cold Reset 
¢ Soft Reset 
¢ nonmaskable interrupt (NMI) 


* remaining processor exceptions 


When the EXL and ERL bits in the Status register are 0 in normal operation either 
User, Supervisor, or Kernel operating mode is specified by the KSU bits in the 
Status register. If one of the EXL and REL bits is 1, the processor is in the Kernel 
mode. 


If an exception occurs in the processor, the EXL bit is set to 1, and the system 
enters the Kernel mode. After information has been saved, the EXZL bit is reset to 
0 by an exception handler in most of the cases. The EXL bit is set to | again by 
an exception handler so that the information that has been saved is not lost due to 
occurrence of another exception while the information is restored. 


When execution exits from the exception processing, the EXZ bit is reset to 0. For 
details, refer to ERET Instruction of Chapter 16 CPU Instruction Set Details. 


6.4.2 Exception Vector Locations 


The Cold Reset, Soft Reset, and NMI exceptions are always vectored to: 
¢ location OxBFCO 0000 in 32-bit mode 
¢ location OxFFFF FFFF BFCO 0000 in 64-bit mode 


These addresses are a non-cache, non-TLB mapping area. 


Addresses for the remaining exceptions are a combination of a vector offset and a 
base address. 


64-bit mode exception and 32-bit mode exception vectors, and their offsets are 
shown next. 
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Table 6-3 64-Bit Mode Exception Vector Base Addresses 


Vector Base Address Vector Offset 
Cold Reset, Soft Reset, | OxFFFF FFFF BFCO 0000 0x0000 
and NMI (BEV bit is automatically set to 1.) 
TLB Miss, EXL=0 0x0000 
e OxFFFF FFFF 8000 0000 (BEV=0) 
TE NS ee OxFFFF FFFF BFCO 0200 (BEV=1) oe 
Other 0x0180 


Table 6-4 32-Bit Mode Exception Vector Base Addresses 


Vector Base Address Vector Offset 
Cold Reset, Soft Reset, | O0xBFCO 0000 0x0000 
and NMI (BEV bit is automatically set to 1.) 
TLB Miss, EXL=0 0x0000 
: 7 0x8000 0000 (BEV=0) 
XTLB Miss, EXL=0 0xBFCO0 0200 (BEV=1) 0x0080 
Other 0x0180 


E.g. TLB Miss vector (EXL = 0): When BEV = 0, the vector base for this 
exception vector is in ksegO (uncached, TLB unmapped space) (0x8000 0000 in 
32-bit mode, OxFFFF FFFF 8000 0000 in 64-bit mode). 


When BEV = 1, the vector base address for this exception vector is in kseg/ 
(uncached, TLB unmapped space) OxBFCO 0200 in 32-bit mode and OxFFFF 
FFFF BFCO 0200 in 64-bit mode. This is a TLB unmapped space, allowing the 
exception to bypass the TLB. 


E.g. General Exception vector: When BEV = 0, the vector base address for this 
exception vector is in ksegO (uncached, unmapped space) (0x8000 0180 in 32-bit 
mode, OxFFFF FFFF 8000 0180 in 64-bit mode). 


When BEV = 1, the vector base address for this exception vector is in kseg/ 
(uncached, TLB unmapped space) (0x8000 0180 in 32-bit mode and OxFFFF 
FFFF BFCO 0380 in 64-bit mode). 


This space is an uncached and TLB unmapped space, allowing the exception 
handler to bypass the cache and TLB. 


User's Manual U10504EJ7VOUM00 181 


Chapter 6 


6.4.3 Priority of Exceptions 


While more than one exception can occur for a single instruction, only the 
exception with the highest priority is reported. 


The priority is as follows: 


Table 6-5 Exception Priority Order 


Cold Reset (highest priority) 

Soft Reset 

Nonmaskable Interrupt (NMI) 
Address error — Instruction fetch 
TLB/XTLB miss — Instruction fetch 
TLB invalid — Instruction fetch 


Bus error — Instruction fetch 


System Call 
Breakpoint 


Coprocessor Unusable 


Reserved Instruction 


Trap 


Integer overflow 


Floating-Point Exception 
Address error — Data access 
TLB/XTLB miss — Data access 
TLB invalid — Data access 


TLB modification — Data write 
Watch 


Bus error — Data access 


Interrupt (lowest priority) 


Generally speaking, the exceptions described in the following sections are 
handled (“processing”) by hardware; these exceptions are handled (‘“‘servicing’’) 
by software. 
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6.4.4 Cold Reset Exception 


Cause 


The Cold Reset exception occurs when the ColdReset signal is asserted and then 
deasserted. This exception is not maskable. 


Processing 


The CPU provides a special interrupt vector for this reset exception: 
¢ location OxBFCO 0000 in 32-bit mode 
¢ location OxFFFF FFFF BFCO 0000 in 64-bit mode 


The Cold Reset vector resides in unmapped and uncached CPU address space, so 
the hardware need not initialize the TLB or the cache to process this exception. It 
also means the processor can fetch and execute instructions while the caches and 
virtual memory are in an undefined state. 


The contents of all registers in the CPU are undefined when this exception occurs, 
except for the following register fields: 


e The TS, SR, and RP bits of the Status register and the EP(3:0) bits of 
the Config register are cleared to 0. 


¢ The ERL and BEV bits of the Status register and the BE bit of the 
Config register are set to 1. 


¢ The Random register is set to the upper-limit value (31). 
¢ The EC(2:0) bits of the Config register are set to the contents of the 
DivMode(1:0)* pins. 


* In Vp4300 and Vp4305. In Vp4310, DivMode(2:0). 


Servicing 
The Cold Reset exception is serviced by: 


¢ initializing all processor registers, coprocessor registers, TLB, caches, 
and the memory system 


e performing diagnostic tests 


¢ bootstrapping the operating system 


User’s Manual U10504EJ7VOUMOO 183 


Chapter 6 
6.4.5 Soft Reset Exception 


Cause 


A Soft Reset (sometimes called Warm Reset) occurs when the ColdReset signal 
remains deasserted while the Reset pin is deasserted after assertion of more than 
16 MasterClock cycles. 


A Soft Reset immediately resets all state machines, and sets the SR bit of the Status 


register. Execution begins at the reset vector when a Soft Reset occurs. 


This exception is not maskable. 


Processing 


The CPU provides a special interrupt vector for this exception (same location as 
Cold Reset): 
¢ location OxBFCO 0000 in 32-bit mode 


¢ location OxFFFF FFFF BFCO 0000 in 64-bit mode 


This vector is located within unmapped and uncached address space, so that the 
cache and TLB need not be initialized to process this exception. When a Soft 
Reset occurs, the SR bit of the Status register is set to distinguish this exception 
from a Cold Reset exception. 


When this exception occurs, the contents of all registers are preserved except for: 


e The program counter value when this exception occurs is set to the 
ErrorEPC register, when the ERL bit of the Status register is 0. 


¢ TS and RP bits of the Status register are cleared to 0. 
e ERL, SR, and BEV bits of the Status register are set to 1. 


Because the Soft Reset can abort cache and access to the system interface, cache 
and memory state is undefined when this exception occurs. 


Servicing 


The Soft Reset exception is serviced by saving the current processor state for self- 
diagnostic purposes, and reinitializing the system in the same manner as the Cold 
Reset exception. 
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6.4.6 Non-Maskable Interrupt (NMI) Exception 


Cause 


The Non-maskable Interrupt (NMI) exception occurs in response to the falling 
edge of the NMI pin. An NMI can also be set by externally writing 1 to the bit 6 
of the internal interrupt register through the SysAD6 bus. 


Unlike all other interrupts, this interrupt is not maskable; it occurs regardless of 
the settings of the EXL, ERL, and the JE bits in the Status register. 


Processing 


The CPU provides a special interrupt vector for this exception (same location as 
Cold Reset): 


¢ location OxBFCO 0000 in 32-bit mode 

¢ location OxFFFF FFFF BFCO 0000 in 64-bit mode 
This vector is located within unmapped and uncached address space so that the 
cache and TLB need not be initialized to process this exception. When an NMI 


exception occurs, the SR bit of the Status register is set to differentiate this 
exception from a Reset exception. 


Unlike Cold Reset and Soft Reset, but like other exceptions, NMI is taken only at 
instruction boundaries. The state of the caches and memory system are preserved 
by this exception. 


When this exception occurs, the contents of all registers are preserved except for: 


e The program counter value when this exception occurs is set to the 
ErrorEPC register. 


¢ TS bit of the Status register are cleared to 0. 
e ERL, SR, and BEV bits of the Status register are set to 1. 


Servicing 


The NMI exception is serviced by saving the current processor state for self- 
diagnostic purposes, and reinitializing the system in the same manner as the Cold 
Reset exception. 
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6.4.7 Address Error Exception 


Cause 


The Address Error exception occurs when an attempt is made to execute one of 
the following: 


e Execute the LW or SW instruction to the word data that is not located 
at the word boundary. 


e Execute the LH or SH instruction to the halfword data that is not 
located at the halfword boundary. 


e Execute the LD or SD instruction to the doubleword data that is not 
located at the doubleword boundary. 


* Reference the Kernel address space from User or Supervisor mode 
¢ Reference the supervisor address space from User mode 
¢ Reference an address not in Kernel, Supervisor, or User space in 64- 


bit Kernel, Supervisor, or User mode. 


This exception is not maskable. 


Processing 


The common exception vector is used for this exception. The AdEL or AdES code 
in the Cause register is set, indicating whether the instruction caused the exception 
with an instruction reference (AdEL), load operation (AdEL), or store operation 
(AdES). 


When this exception occurs, the BadVAddr register retains the virtual address that 
was not properly aligned or was referenced in protected address space. The 
contents of the VPN field of the Context and EntryHi registers are undefined, as 
are the contents of the EntryLo register. 


The EPC register contains the address of the instruction that caused the exception, 
unless this instruction is in a branch delay slot. If it is in a branch delay slot, the 
EPC register contains the address of the preceding branch instruction and the BD 
bit of the Cause register is set. 


Servicing 


The process executing at the time is handed a UNIX™ SIGSEGV (segmentation 
violation) signal by Kernel. This error is usually fatal to the process incurring the 
exception. 
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6.4.8 TLB Exceptions 


Three types of TLB exceptions can occur: 


e TLB Miss exception occurs when there is no TLB entry that matches 
an attempted reference to a mapped address space. 


e TLB Invalid exception occurs when a virtual address reference 
matches a TLB entry that is marked invalid (V bit = 0). 


e TLB Modification exception occurs when a store operation virtual 
address reference to memory matches a TLB entry which is marked 
valid but is not dirty (the entry is not writable, D bit = 0). As a result, 
this exception only occurs for the data cache, resulting in a lower 
priority for this exception. 


The following describe these TLB exceptions. 
TLB Miss Exception (32-bit mode)/XTLB Miss Exception (64-bit mode) 


Cause 


The TLB (XTLB) Miss exception occurs when there is no TLB entry to match an 
address to be referenced. This exception is not maskable. 


Processing 


There are two special vectors for this exception. One is for the 32-bit mode, and 
the other is for the 64-bit mode. The UX, SX, and KX bits of the Status register 
determine whether the user, supervisor or Kernel address spaces referenced are 
32-bit or 64-bit spaces. All TLB Miss exceptions use these two special vectors 
when the EXZL bit is set to 0 in the Status register, and they use the common ex- 
ception vector when the EXZ bit is set to 1 in the Status register. 


This exception sets the TLBL or TLBS code to the ExcCode area of the Cause reg- 
ister. If the cause of the exception is an instruction reference or load operation, 
the TLBL code is set; if the cause is a store operation, the TLBS code is set. 


When this exception occurs, the BadVAddr, Context, XContext and EntryHi 
registers hold the virtual address that failed address translation. The EntryHi 
register also contains the ASID from which the translation fault occurred. The 
Random register normally contains a valid location in which to place the 
replacement TLB entry. The contents of the EntryLo register are undefined. 


The EPC register contains the address of the instruction that caused the exception, 
unless this instruction is in a branch delay slot, in which case the EPC register 
contains the address of the preceding branch instruction and the BD bit of the 
Cause register is set. 
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Servicing 


To service this exception, the contents of the Context or XContext register are used 
as a virtual address to load memory words containing the physical page frame and 
access control bits to a pair of TLB entries. Memory words are written into the 
TLB through the EntryLo0/EntryLo1/EntryHi register. 


It is possible that the page frame and access control bit are placed on a page where 
the virtual address is not resident in the TLB. This condition is processed by 
allowing a TLB Miss exception in the TLB Miss exception handler. This second 
exception goes to the common exception vector because the EXL bit of the Status 
register is set. 


TLB Invalid Exception 


Cause 


The TLB Invalid exception occurs when a virtual address reference matches a 
TLB entry that is marked invalid (TLB valid bit cleared). This exception is not 
maskable. 


Processing 


The common exception vector is used for this exception. The TLBL or TLBS code 
is set to the ExcCode field of the Cause register. If the cause of the exception is 
an instruction reference or load operation, the TLBL code is set; if the cause is a 
store operation, the TLBS code is set. 


When this exception occurs, the BadVAddr, Context, XContext and EntryHi 
registers contain the virtual address that failed address translation. The EntryHi 
register also contains the ASID from which the translation fault occurred. The 
contents of the EntryLo register are undefined. 


The EPC register contains the address of the instruction that caused the exception 
unless this instruction is in a branch delay slot, in which case the EPC register 
contains the address of the preceding branch instruction and the BD bit of the 
Cause register is set. 
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Servicing 
A TLB entry is typically marked invalid when one of the following is true: 
* a virtual address does not exist 
e the virtual address exists, but is not in main memory (a page fault) 


e atrap is desired on any reference to the page (for example, to 
maintain a reference bit) 


After removing the cause of a TLB Invalid exception, place another entry to the 
location of the TLB entry where the exception has occurred by the TLB Probe 
(TLBP) instruction and set 1 to the V bit. 


TLB Modification Exception 


Cause 


The TLB change exception occurs if the TLB entry that matches the virtual 
address referenced by the store instruction is disabled from being written (the D 
bit is 0), though the TLB entry is valid (V bit is 1). This exception occurs only 
when an attempt is made to write the data cache. Note, however, that the priority 
of this exception is low. 


Processing 


The common exception vector is used for this exception, and the Mod code is set 
to the ExcCode field in the Cause register. 


When this exception occurs, the BadVAddr, Context, XContext and EntryHi 
registers contain the virtual address that failed address translation. The EntryHi 
register also contains the ASID from which the translation fault occurred. The 
contents of the EntryLo register are undefined. 


The EPC register contains the address of the instruction that caused the exception 
unless that instruction is in a branch delay slot, in which case the EPC register 
contains the address of the preceding branch instruction and the BD bit of the 
Cause register is set. 


Servicing 


The Kernel uses the failed virtual address or virtual page number to identify the 
corresponding access control bits. The page identified may or may not permit 
write accesses; if writes are not permitted, a write protection violation occurs. 


If write accesses are permitted, the page frame is marked dirty/writable by the 
Kernel in its own data structures. 
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The TLBP instruction places the index of the TLB entry that must be altered into 
the Index register. The EntryLo register is loaded with a word containing the 
physical page frame and access control bits (with the D bit set), and the contents 
of the EntryHi and EntryLo registers are written into the TLB. 


6.4.9 Bus Error Exception 
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Cause 


A Bus Error exception is raised by board-level circuitry for events such as bus 
time-out, local bus parity errors, and invalid physical memory addresses or access 
types. This exception is not maskable. 


A Bus Error exception occurs only when a cache miss refill, uncached field 
reference, or unbuffered write occurs synchronously; in concrete terms, a Bus 
Error exception occurs if SysCmd(0) indicates that the data contains an error when 
it is transferred on the system bus, regardless of the direction of the transfer 
between the system and the processor. An exception for the local bus error of the 
system resulting from a buffered write transaction is generated using the interrupt 
exception. 


Processing 


The common interrupt vector is used for a Bus Error exception. The JBE or DBE 
code in the ExcCode field of the Cause register is set. If the cause of the exception 
is an instruction reference (instruction fetch), the JBE code is set. If the cause is a 
data reference (load/store), the DBE code is set. 


The EPC register contains the address of the instruction that caused the exception, 
unless it is in a branch delay slot, in which case the EPC register contains the 
address of the preceding branch instruction and the BD bit of the Cause register is 
set. 
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Servicing 


The physical address at which the fault occurred can be computed from 
information available in the system control coprocessor registers. 


¢ If the JBE code in the Cause register is set (indicating an instruction 
fetch), the virtual address is contained in the EPC register (or 4 + the 
contents of the EPC register if the BD bit of the Cause register is set). 


e Ifthe DBE code is set (indicating a load or store), the virtual address 
of the instruction that caused the exception (the address of the 
preceding branch instruction if the BD bit of the Cause register is set) 
is stored in the EPC register (or 4 + the contents of the EPC register 
if the BD bit of the Cause register is set). 


The virtual address of the load and store reference can then be obtained by 
interpreting the instruction. The physical address can be obtained by using the 
TLBP instruction and reading the EntryLo register to compute the physical page 
number. 


The process executing at the time of this exception is handed a UNIX SIGBUS 
(bus error) signal, which is usually fatal. 


6.4.10 System Call Exception 


Cause 
A System Call exception occurs during an attempt to execute the SYSCALL 
instruction. This exception is not maskable. 

Processing 


The common exception vector is used for this exception, and the Sys code is set to 
the ExcCode field in the Cause register. 


The EPC register contains the address of the SYSCALL instruction unless it is in 
a branch delay slot. If the SYSCALL instruction is in a branch delay slot, the EPC 
register contains the address of the preceding branch instruction and the BD bit of 
the Cause register is set; otherwise this bit is cleared. 
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Servicing 


When this exception occurs, control is transferred to the applicable system 
routine. 


To resume execution, the EPC register must be altered so that the SYSCALL 
instruction does not re-execute; this is accomplished by adding a value of 4 to the 
EPC register (EPC register + 4) before returning. 


If a SYSCALL instruction is in a branch delay slot, the branch instruction is 
decoded to branch and re-execute. 


6.4.11 Breakpoint Exception 
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A Breakpoint exception occurs when an attempt is made to execute the BREAK 
instruction. This exception is not maskable. 


Processing 


The common exception vector is used for this exception, and the BP code is set to 
the ExcCode in the Cause register. 


The EPC register contains the address of the BREAK instruction unless it is in a 
branch delay slot. If the BREAK instruction is in a branch delay slot, the EPC 
register contains the address of the preceding branch instruction and the BD bit of 
the Cause register is set, otherwise the bit is cleared. 


Servicing 


When the Breakpoint exception occurs, servicing is transferred to the applicable 
system routine. Additional information can be passed using the unused bits of the 
BREAK instruction (bits 25:6). This information can be obtained by reading the 
contents indicated by the EPC register as data. (A value of 4 must be added to the 
contents of the EPC register (EPC register + 4) to locate the instruction if it resides 
in a branch delay slot.) 


To resume execution, the EPC register must be altered so that the BREAK 
instruction does not re-execute; this is accomplished by adding a value of 4 to the 
EPC register (EPC register + 4) before returning. If a BREAK instruction is ina 
branch delay slot, decode the branch instruction to get the branch destination and 
resume execution. 
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6.4.12 Coprocessor Unusable Exception 


Cause 


The Coprocessor Unusable exception occurs when an attempt is made to execute 
a coprocessor instruction for either: 


e If use of the corresponding coprocessor unit is not marked usable 
(CU bits (3:1) of the Status register = 0). 


e — If the CPO instruction is executed in the User or Supervisor mode 
when CPO cannot be used (CUO bit of the Status register = 0). 


This exception is not maskable. 


Processing 


The common exception vector is used for this exception, and the CpU code is set 
to the ExcCode in the Cause register. 


The CE bits of the Cause register indicate which of the four coprocessors was 
referenced. 


The EPC register indicates the coprocessor instruction that caused an exception. 
If the coprocessor instruction that caused the exception is in a branch delay slot, 
the EPC register indicates the preceding branch instruction and the BD bit of the 
Cause register is set. 


Servicing 


The coprocessor unit to which an attempted reference was made is identified by 
the CE bit of the Cause register, process as follows by a handler. 


e If the process is entitled access to the coprocessor, the coprocessor is 
marked usable and the coprocessor resumes execution. 


e If the process is entitled access to the coprocessor, but the 
coprocessor does not exist or has failed, decoding of the coprocessor 
instruction is possible. 


¢ If the BD bit is set in the Cause register, the branch instruction must 
be decoded; then the coprocessor instruction can be emulated and 
execution resumed by making the contents of the EPC register 
advanced past the coprocessor instruction. 
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e If the process is not entitled access to the coprocessor, the Kernel 
informs the current process of the UNIX SIGILL/ILL_PRIVIN_ 
FAULT (illegal instruction/privileged instruction fault) signal. This 
exception is usually fatal. 


6.4.13 Reserved Instruction Exception 


Cause 


The Reserved Instruction exception occurs when one of the following conditions 
occurs: 


* an attempt is made to execute an instruction with an undefined 
opcode (bits 31:26) 


* an attempt is made to execute a SPECIAL instruction with an 
undefined sub-opcode (bits 5:0) 


* an attempt is made to execute a REGIMM instruction with an 
undefined sub-opcode (bits 20:16) 


* an attempt is made to execute 64-bit operations in 32-bit mode when 
in User or Supervisor modes 


64-bit operations are always valid in Kernel mode regardless of the value of the 
KX bit in the Status register. 


This exception is not maskable. 


Processing 


The common exception vector is used for this exception, and the RJ code is set in 
the ExcCode field in the Cause register. 


The EPC register indicates the instruction that caused an exception if the reserved 

instruction is not in a branch delay slot, in which case the EPC register indicates 

the preceding branch instruction and the BD bit of the Cause register is set. 
Servicing 

All instructions in the MIPS ISA that are currently defined can be executed. 


The process executing at the time of this exception is handled by a UNIX SIGILL/ 
ILL_RESOP_FAULT (illegal instruction/reserved operand fault) signal. This 
exception is usually fatal. 
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6.4.14 Trap Exception 


Cause 


The Trap exception occurs when a TGE, TGEU, TLT, TLTU, TEQ, TNE, TGEI, 
TGEUI, TLTI, TLTUI, TEQI, or TNEI instruction results in a TRUE condition. 
This exception is not maskable. 


Processing 


The common exception vector is used for this exception, and the 7r code is set in 
the ExcCode field in the Cause register. 


The EPC register indicates the Trap instruction that caused the exception. If the 
instruction is in a branch delay slot, the EPC register indicates the preceding 
branch instruction and the BD bit of the Cause register is set. 


Servicing 


The process executing at the time of a Trap exception is handed a UNIX SIGFPE/ 
FPE_INTOVF_TRAP (floating-point exception/integer overflow) signal by 
Kernel. This exception is usually a fatal error. 
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6.4.15 
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Integer Overflow Exception 


Cause 


An Integer Overflow exception occurs when an ADD, ADDI, SUB, DADD, 
DADDI or DSUB instruction results in a 2’s complement overflow. This 
exception is not maskable. 


Processing 


The common exception vector is used for this exception, and the Ov code is set in 
the ExcCode field in the Cause register. 


The EPC register indicates the instruction that caused the exception. If the 
instruction is in a branch delay slot, the EPC register indicates the preceding 
branch instruction and the BD bit of the Cause register is set. 


Servicing 


The process executing at the time of the exception is handed a UNIX SIGFPE/ 
FPE_INTOVF_TRAP (floating-point exception/integer overflow) signal by 
Kernel. This exception is usually a fatal error to the current process. 


User's Manual U10504EJ7VOUM00 


Exception Processing 
6.4.16 Floating-Point Exception 


Cause 


The Floating-Point exception is generated by the floating-point coprocessor. This 
exception is not maskable. 


Processing 


The common exception vector is used for this exception, and the FPE code is set 
in the ExcCode field in the Cause register. 


The contents of the Floating-Point Control/Status register indicate the cause of 
this exception. 


The EPC register indicates the reserved instruction if the instruction is not in a 
branch delay slot. If the instruction is in the branch delay slot, the EPC register 
indicates the preceding branch instruction and the BD bit of the Cause register is 
set. 


Servicing 


This exception is cleared by clearing the appropriate bit in the Floating-Point 
Control/Status register. 


For an unimplemented instruction exception, the Kernel must emulate the 
instruction; for other exceptions, the Kernel should pass the exception to the user 
program that caused the exception. 
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6.4.17 Watch Exception 
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Cause 


A Watch exception occurs when a load or store instruction references the physical 
address specified in the WatchLo/WatchHi registers. The exception is caused by 
the following instructions: a load instruction when the R bit is set in the WatchLo 
register; a store instruction when the W bit is set in the WatchLo register; a load or 
store instruction when both the R and W bits are set in the WatchLo register. 


The CACHE instruction never causes a Watch exception. 


The Watch exception is postponed if the EXL bit is set in the Status register. The 
Watch exception is maskable by setting the EXL bit in the Status register to | or 
by clearing the R and W bits in the WatchLo register to 0. 


Processing 


The common exception vector is used for this exception, and the Watch code is set 
in the ExcCode field in the Cause register. 


The EPC register indicates the Load and Store instructions if they are not in a 
branch delay slot. If these instructions are in the branch delay slot, the EPC 
register indicates the preceding branch instruction and the BD bit of the Cause 
register is set. 


Servicing 


The Watch exception is a debugging aid; typically the exception handler transfers 
control to a debugger, allowing the user to examine the situation. To continue, the 
Watch exception must be masked to execute the faulting instruction. The Watch 
exception must then be reenabled. 


Because the contents of the WatchLo/WatchHi registers become undefined after 
reset, initialize the registers by software (especially clear the R and W bits to 0). 
If not initialized, the Watch exception may occur. 
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6.4.18 Interrupt Exception 


Cause 


The Interrupt exception occurs when one of the eight interrupt conditions (one for 
timer interrupt; five for hardware interrupt; two for software interrupt) is asserted. 
The significance of these interrupts is dependent upon the specific system 
implementation. An interrupt request signal from a pin is detected by the level. 


Each of the eight interrupts can be masked by clearing the corresponding bit in the 
Int-Mask field of the Status register, and all of the eight interrupts can be masked 
at once by clearing the JE bit, setting the EXL bit, or setting the ERL bit of the 
Status register. 


Processing 


The common exception vector is used for this exception, and the Jnt code is set in 
the ExcCode field in the Cause register. 


The /P field of the Cause register indicates current interrupt requests. It is 
possible before this register is read that more than one of the bits can be 
simultaneously set if the interrupt request signal is asserted; or that more than one 
of the bits can be simultaneously cleared if the interrupt request signal is 
deasserted. 


If the instruction that causes an exception is not in a branch delay slot the EPC 
register indicates that instruction. If the instruction is in the branch delay slot, the 
EPC register indicates the preceding branch instruction and the BD bit of the 
Cause register is set. 


Servicing 


If the interrupt is caused by one of the two software-generated exceptions (SW/ or 
SW0O), the interrupt condition is cleared by setting the corresponding Cause 
register bit to 0. 


If an interrupt is generated by the hardware, the interrupt is cleared by asserting 
inactive the interrupt request signal that has caused the interrupt. 


If the timer interrupt request is generated, either clear the /P7 bit of the Cause 
register or change the contents of the Compare register, to clear this interrupt. 
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6.5 Exception Handling and Servicing Flowcharts 
The remainder of this chapter contains flowcharts for the following exceptions 
and guidelines for their handlers: 


e general purpose exceptions handling and a guideline for their 
exception handler 


¢ TLB/XTLB miss exception handling and a guideline for their 
exception handler 


¢ Cold Reset, Soft Reset and NMI exceptions handling, and a guideline 
for their handler. 


Generally speaking, the exceptions are handled (“processing”) by hardware; the 
exceptions are then handled (“servicing”) by software. 
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(a) Exceptions other than Cold Reset, Soft Reset, NMI, 
or TLB/XTLB Miss Handling (Hardware) 


( Start ) 


Set FP Control Status Register 
EnHi <- VPN2, ASID 
X/Context <- VPN2 

Set Cause Register 

EXcCode, CE 

BadVAddr Register Setting 


Yes Instr. in 


Br.Dly. Slot? 


BD bit of Cause Register <- 1 
EPC <- (PC-4) 


BD bit of Cause Register <- 0 
EPC <- PC 


ae 


= 0 (normal) 


| EXL <- 1 | 


=1 (bootstrap) 


Commenis 
FP Control/Status Register are 
only set if the respective exception 
occurs. 
EnHi, X/Context are set only for 
TLB-Invalid, Modification & Miss 
exceptions. It is not set by bus 
error exceptions, however. 


; Check for multiple 
exception 


; Processor moves to Kernel Mode 
& interrupt disabled 


PC <- OxFFFF FFFF 8000 0000 + 180 


(unmapped, cached) 


PC <- OxFFFF FFFF BFCO 0200 + 180 
(unmapped, uncached) 


‘i 


( To General Purpose Exception Servicing Guidelines _) 


Remark 


Interrupts can be masked by IE or IMs and Watch is postponed if EXL = 1 


Figure 6-14 General Purpose Exception Handler (1/2) 
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(b) General Purpose Exception Servicing Guidelines (Software) 


General Purpose Exception Servicing Guidelines 


Reset the processor 


MFCO Instruction Executed 
X/Context 
EPC 
Status 
Cause 


MFCO Instruction Executed 
(Set Status Bits:) 
KSU<- 00 
EXL <- 0 
IE=1 


Check Cause Register & 
Jump to appropriate 
Service Routine 


TS bit of 
Status 
Register = 0? 


| . . 

, Each exception routine 
| service 
I 


MFCO Instruction Executed 
EPC 
Status 


( ERET_) 


Prevents TLB modification, TLB 
invalid, and TLB miss exceptions 
from occurring by using mapping 
disable area 

EXL=1 so Watch, Interrupt 
exceptions disabled 

OS/System to avoid all other 
exceptions 

Only Cold Reset, Soft Reset, NMI 
exceptions possible. 


Optional: Interrupts are enabled 
in Kernel mode. 


After EXL=0, all exceptions 
allowed. 

(except interrupt if masked by IE 
or IM) 


Optional: Check only if double 
TLB miss 


Save Register File 


ERET is not allowed in the branch 
delay slot of another Jump 
Instruction 

Processor does not execute the 
instruction which is in the ERET 
instruction’s branch delay slot 
PC <- EPC, EXL <- 0, LLbit <- 0 


Figure 6-14 General Purpose Exception Handler (2/2) 
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(a) Hardware 


( Start ) 


EnHi <- VPN2, ASID 
X/Context <- VPN2 

Cause Register Setting 
(EXcCode) 

BadVAddr Register Setting 


Instr. in 
Br.Dly. Slot? 


Yes 


Comments 


EXL = 0? No 
(SR bit 1) 
No el Check for multiple 
Yes (SR bit 1) exception 


BD bit of Cause Register <- 0 


BD bit of Cause Register <- 1 


EPC <- (PC-4) EPC <- PC 
General Purpose Exception 
Vec. Off. = 0x080 
XTLB Miss Exception TLB Miss Exception 
Vec. Off. = 0x080 Vec. Off. = 0x000 
>~< ¥ 
; Processor moves 
aici to Kernel Mode 
& interrupt 
disabled 
= 0 (normal) BEV =1 (bootstrap) 
(SR bit 22) 
PC <- OxFFFF FFFF 8000 0000 + Vec. Off. PC <- OxFFFF FFFF BFCO 0200 + Vec. Off. 
(unmapped, cached) (unmapped, uncached) 


al 


(To TLB/XTLB Exception Servicing Guidelines _) 


Figure 6-15 TLB/XTLB Miss Exception Handler (1/2) 
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(bo) TLB/XTLB Exception Servicing Guidelines (Software) 


( TLB/XTLB Exception Servicing Guidelines ) 


MFCO Instruction Executed 
Context 


Each Exception Routine 
Servicing 


Comments 


Prevents TLB modification, TLB invalid, and TLB Miss 
exceptions from occurring by using mapping disable 
area 

EXL=1 so Watch, Interrupt exceptions disabled 
OS/System to avoid all other exceptions 

Only Cold Reset, Soft Reset, NMI exceptions possible 


Load the physical address corresponding to the virtual 
address in loaded in X/Context Register to Entry Lo 
Register and Write into the TLB 

There could be a TLB miss again during the mapping of 
the data or instruction address. The processor may 
jump to the general purpose exception vector since the 
EXL is 1. 

(Either processes TLB miss in general purpose 
exception handler, or returns to user program by using 
ERET instruction and generates TLB Miss exception 
again.) 


ERET is not allowed in the branch delay slot of another 
Jump Instruction 

Processor does not execute the instruction which is in 
the ERET instruction’s branch delay slot 

PC <- EPC, EXL <- 0, LLbit <- 0 


Figure 6-15 TLB/XTLB Miss Exception Handler (2/2) 
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Cold Reset, Soft Reset & NMI Exception 


Cold Reset, Soft Reset & NMI Exception 


Processing Guidelines (HW) 


Exception Processing 


(Soft Reset or NMI Exception _) 


q 


Cold Reset Exception 


Random <- 31 
Status: Wired <- 0 

RP <- 0 (soft reset) Update 31-4 bit of Config register 
BEV <- 1 Status: RP <- 0 
TS <-0 BEV <- 1 
SRe<- 1 TS <-0 
ERL <- 1 SR<- 0 

ERL <- 1 


ErrorEPC <- PC 


PC <- OxFFFF FFFF BFCO 0000 


Servicing Guidelines (SW) 


N 
NMI Exception 


Routine Service 


No 


SR bit of 
Status Register 


(Optional) 


( ERET) 


Servicing of soft reset 
exception routine 


Comments 


There is no indication from the 
processor to differentiate between 


MI & Soft Reset; there must be a 


system level indication. 


Servicing of cold reset 
exception routine 


Figure 6-16 Cold Reset, Soft Reset & NMI Exception Handler 
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7.1 Overview 


All floating-point instructions, as defined in the MIPS ISA for the floating-point 
coprocessor, CP1, can be processed by the Vp4300. Logically, the Floating-Point 
Arithmetic Unit (FPU) exists as an individual coprocessor; however, unlike those 
of the Vp4400, the Vp4300 FPU is physically integrated into the Integer 
Arithmetic Unit (CPU). The CPU and the FPU use a common datapath and FPU 
instructions are fully-implemented in the CPU hardware. Unlike the Vp4400 
implementation, Vp4300 integer instructions cannot be executed until a 
multicycle floating-point instruction has been completed. 


The execution of floating-point instructions can be disabled by the coprocessor 
usability CU bit defined in the System Control Coprocessor (CPO) Status register. 


7.2 FPU Programming Model 


This section describes the structure of the registers, memory, and data, and usable 
General Purpose registers. Moreover, the FPU registers are described in detail. 


7.2.1 Floating-Point General Purpose Register (FGR) 


208 


The FPU has one set of floating-point general purpose register (FGR) and two 
Control registers (Control/Status register: FCR31, Implementation/Revision 
register: FCRO). The general purpose register can be used in the following three 
ways. 


¢ As 32 General Purpose registers (32 FGRs), each of which is 32 bits 
wide when the FR bit in the Status register equals 0; or as 32 General 
Purpose registers (32 FGRs), each of which is 64-bits wide when FR 
equals 1. The CPU accesses these registers through load, store, and 
transfer instructions. 


¢ As 16 floating-point registers (FPR) (see the next section for a 
description of FPRs), each of which is 64-bits wide, when the FR bit 
in the Status register equals 0. The FPRs hold values in either single- 
or double-precision floating-point format. Each FPR corresponds to 
adjacently numbered FGRs as shown in Figure 7-1. 


* As 32 floating-point registers (FPR) (see the next section for a 
description of FPRs), each of which is 64-bits wide, when the FR bit 
in the Status register equals 1. The FPRs hold values in either single- 
or double-precision floating-point format. Each FPR corresponds to 
an individual FGR as shown in Figure 7-1. 
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Floating-Point 


Registers (FPR) General Purpose Registers 


Floating-Point 


Floating-Point 


Registers (FPR) General Purpose Registers 


Floating-Point Operations 


FPRO (Low-order) FGRO FPRO FGRO 
(High-order) FGR1 FPR1 FGR1 
FPRo eeu! FGR2 FPR2 FGR2 
(High-order) FGR3 FPR3 FGR3 
(Low-order) FGR28 FPR28 FGR28 
FPR28 (High-order) FGR29 FPR29 FGR29 
z FPR30 FGR30 
EPR30 (Low-order) FGR30 
(High-order) FGR31 FPR31 FGR31 


Floating-Point 
Control Registers 


(FCR) ‘ # ; 
Implementation/Revision Register 


(FCRO) 


Control/Status Register 


(FCR31) 
31 0 31 


_ 


Figure 7-1 FPU Registers 
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7.2.2 Floating-Point Registers (FPR) 


CP1 provides: 


¢ 16 Floating-Point registers (FPRs) when the FR bit in the Status 
register equals 0, or 


¢ 32 Floating-Point registers (FPRs) when the FR bit in the Status 
register equals 1. 


FPR possesses logical 64-bit registers, holds floating-point values during floating- 
point operations, and is physically formed from the General Purpose registers 
(FGRs). FPR can be accessed through a Floating-Point Arithmetic Instruction. 
FPR is physically configured with General Purpose registers (FGRs). When the 
FR bit in the Status register equals 0, the FPR is configured with two 32-bit FGRs. 
When the FR bit in the Status register equals 1, the FPR is configured with a single 
64-bit FGR. 


The FPRs hold values in either single- or double-precision floating-point format. 
If the FR bit equals 0, only even numbers (the least register, as shown in Figure 7- 
1) can be used to address FPRs. When the FR bit equals 1, all FPR register 
numbers are valid. If the FR bit equals 0 during a double-precision floating-point 
operation, the FGR can be used in double pairs. Thus, in a double-precision 
operation, selecting Floating-Point Register 0 (FPRO) actually uses adjacent 
Floating-Point General Purpose registers FGRO and FGR1. 
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7.2.3 Floating-Point Control Registers (FCRs) 
The FPU in the Vp4000 Series (excluding Vp4100) has 32 control registers. With 
the Vp4300, the following two FCRs are valid. 


e The Control/Status register (FCR31) controls and monitors 
exceptions, holds the result of compare operations, and establishes 
rounding modes. 


¢ The Implementation/Revision register (FCRO) holds revision 
information about the FPU. 


Table 7-1 lists the assignments of the FCRs. 


Table 7-1 Floating-Point Control Register Assignments 


FCR Number Use 
FCRO Coprocessor implementation/revision register 
FCRI1 to FCR30_ | Reserved 
FCR31 Rounding mode, cause, exception enables, and flags 


7.2.4 Control/Status Register (FCR31) 


The Control/Status register (FCR31) is a read/write register, and holds control 
data and status data. FCR3/ controls the rounding mode and enables occurrence 
of the floating-point exception. It also indicates the information on the exception 
that has caused by the instruction executed last and information on the exceptions 
that have been masked and therefore have not occurred. Figure 7-2 shows the 
configuration of FCR31/. 


Control/Status Register (FCR31) 


31 25 24 23 22 18 17 12 11 7 6 21 0 
Cause Enables Flags RM 
0 FS} C 0 EVZOUI| VZOUI | VZOUI 
7 t 4 5 6 5 5 2 


Figure 7-2. Control/Status Register Bit Assignments 
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Bit# 17 16 15 14 13 12 
Cause 
E V Z O U l | Bits 
| | | | | 
Bit # 11 10 9 8 7 
Enable 
V Z O U l | Bits 
| | | | | 
Bit #6 5 4 3 2 
Flag 
V Z O U | Bits 


Inexact Operation 
Underflow 
Overflow 
Division by Zero 
Invalid Operation 
Unimplemented Operation 


Figure 7-3 Control/Status Register (FCR31) Cause, Enable, and Flag Bit Fields 


The contents of FCR3/ and FCRO can be read by using the CFC1 instruction. 


The bits of FCR31 can be set or cleared by using the CTC1 instruction. FCRO is 
a read-only register. The contents of a register to which data is to be written are 
undefined when an instruction that immediately follows the instruction that writes 
data to the register is executed. The pipeline does not interlock. 


The IEEE754 specifies detection of an exception during a floating-point 
operation, setting flags, and calling an exception handler in case of an exception. 
With the MIPS architecture, these specifications are realized by the cause, enable, 
and flag bits of the Control/Status register. The flag bit conforms to the exception 
status flag of the IEEE754, and the cause and enable bits conform to the exception 
handler of the IEEE754. 


Each bit of FCR3/ is described next. 
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FS bit 


C Bit 


Floating-Point Operations 


The FS bit enables a value that cannot be normalized (denormalized number) to 
be flashed. When the F'S bit is set and the enable bit is not set for the underflow 
exception and illegal exception, the result of the denormalized number does not 
cause the unimplemented operation exception, but is flushed. Whether the flushed 
result is 0 or the minimum normalized value is determined depending on the 
rounding mode (refer to Table 7-2). If the result is flushed, the Flag and Cause 
bits are set for the underflow and illegal exceptions. 


Table 7-2 Flush Values of Denormalized Number Results 


. Flushed Result 
Denormalized Rounding Mode 
Number Result 
RN RZ RP RM 
Positive +0 +0 +2Fimin +0 
Negative -0 -0 -0 a ai 


When a floating-point Compare operation takes place, the result is stored at bit 23, 
the Condition bit. The C bit is set to 1 if the condition is true; the bit is cleared to 
0 if the condition is false. Bit 23 is affected only by compare and CTC1 
instructions. 


Cause, Flag, and Enable Fields 


Figure 7-3 illustrates the Cause, Enable, and Flag fields of the FCR31. 


The Cause and Flag fields are updated by all conversion, computational (except 
MOV.fmt), CTC1, reserved, and unimplemented operation instructions. All other 
instructions have no affect on these fields. 


Cause Bits 


Bits 17:12 in the FCR3/ contain Cause bits which reflect the results of the most 
recently executed floating-point instruction. The Cause bits are a logical 
extension of the CPO Cause register; they identify the exceptions raised by the last 
floating-point operation; and generate exceptions if the corresponding Enable bit 
is set. If more than one exception occurs on a single instruction, each appropriate 
bit is set. 
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The Cause bits are updated by the floating-point operations (except load, store, 
and transfer instructions). The unimplemented operation instruction (E) bit is set 
to a 1 if software emulation is required, otherwise it remains 0. The other bits are 
set to 0 or | to indicate the occurrence or non-occurrence (respectively) of an 
TEEE754 exception. 


If the floating-point operation exception occurs, the operation result is not stored, 
and only the Cause bit is influenced. The type of the exception that has been 
caused by the most-recently-executed floating-point operation can be identified 
by reading the Cause bit. 


Enable Bits 


A floating-point exception is generated any time a Cause bit and the 
corresponding Enable bit are set. As soon as the Cause bit enabled through the 
Floating-point operation, an exception occurs. When both Cause and Enable bits 
are set by the CTC1 instruction, an exception also occurs. 


There is no enable bit for unimplemented operation instruction (£). An 
Unimplemented exception always generates a floating-point exception. 


Before returning from a floating-point exception, software must first clear the 
Cause bits that are enabled to generate exceptions to prevent a repeat of 
exceptions. Thus, User mode programs cannot observe the set Cause bits. To use 
the information by the handler in User mode, save the value of the Status register 
and then call the handler in User mode. 


If the Cause bit is set but the corresponding Enable is not set, no floating-point 
exception occurs and the default result defined by IEEE754 is stored. In this case, 
whether the exceptions were caused by the immediately previous floating-point 
operation can be determined by reading the Cause bit. 


Flag Bits 


The Flag bits are cumulative and indicate the exceptions that were raised after 

reset. Flag bits are set to | if an IEEE754 exception is raised but the occurrence 
of the exception is prohibited. Otherwise, they remain unchanged. The Flag bits 
are never cleared as a side effect of floating-point operations; however, they can 
be set or cleared by writing a new value into the FCR3/, using a CTC] instruction. 


Rounding Mode Control Bits 


Bits 1 and 0 in the FCR31/ register constitute the Rounding Mode (RM) bits. These 
bits specify the rounding mode that FPU uses for all floating-point operations. 
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Table 7-3 Rounding Mode Control Bits 


RM bits : rae 
- ; Mnemonic Description 
Bit1 | BitO 
Round result to nearest representable value; 

0 0 RN round to value with least-significant bit 0 when 
the two nearest representable values are equally 
near. 

Round toward 0: round to value closest to and 

0 1 RZ not greater in magnitude than the infinitely 
precise result. 

Round toward + ©: round to value closest to 

1 0 RP aay ; 
and not less than the infinitely precise result. 

I I RM Round toward — °%: round to value closest to 
and not greater than the infinitely precise result 


User's Manual U10504EJ7VOUM00 215 


Chapter 7 


7.2.5 Implementation/Revision Register (FCRO) 


The Implementation/Revision register (FCRO) is a read-only register and holds the 
implementation identification number and implementation revision number of the 
FPU. This information is used to revise the coprocessor, determine the 
performance level, and to execute self-diagnosis. 


Figure 7-4 shows the layout of the register. 


Implementation/Revision Register (FCRO) 
31 1615 87 0 


0 Imp | Rev 


16 8 8 


Imp : Implementation number (0x0B) 
Rev _ : Revision number in the form of y.x 
0 : RFU. Returns zeroes when read. 


Figure 7-4 Implementation/Revision Register 


The implementation revision number is a value in the format of y.x, where y is the 
major revision number stored to the bits 7:4, and x is the minor revision number 
stored to bits 3:0. Revision of the chip can be identified by the implementation 
revision number. However, the fact that a chip has been changed is not always 
reflected on the revision number. Conversely, a change in the revision number 
does not always reflect an actual change of the chip. Therefore, design the 
program so that it does not depend on the revision number of this register. 
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7.3 Floating-Point Formats 


The FPU supports the performances of both 32-bit (single-precision) and 64-bit 
(double-precision) IEEE754 standard floating-point operations. The 32-bit 
single-precision format has a 24-bit signed fraction field (s+f) and an 8-bit 
exponent (e), as shown in Figure 7-5. 


31 30 23 22 0 


Ss e f 
Sign Exponent Fraction 


1 8 23 
Figure 7-5 Single-Precision Floating-Point Format 


The double-precision format has a 53-bit signed fraction field (s+f) and an 11-bit 
exponent, as shown in Figure 7-6. 


63 62 52 51 0 
Ss e f 
Sign Exponent Fraction 
1 11 52 


Figure 7-6 Double-Precision Floating-Point Format 
As shown in the above figures, numbers in floating-point format are composed of 
three fields: 
e — sign field, s 
* exponent, e = E + bias 
¢ fraction, f = b;bp....bp_; (value at first decimal place or beyond) 


The range of the unbiased exponent F includes every integer between the two 
values Emin and E,,,4, inclusive, together with two other reserved values: 


*  Eniin -1 (to encode +0 and denormalized numbers) 


° Emax +1 (to encode +o and NaNs [Not a Number]) 


For single- and double-precision formats, each representable nonzero numerical 
value has just one encoding. 


For single- and double-precision formats, the value of a number, v, is determined 
by the equations shown in Table 7-4. 
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Table 7-4 Equations for Calculating Values in Single-and Double-Precision 


Floating-Point Format 


(Infinite number) 


No. Equation 
NaN ; 7 

(Nota Number) if E = Ejaxt1 and f 0, then vis NaN, regardless of s 
+ 0 


if E = Emax+1 and f = 0, then v= (-1)S« 


Normalized ; 
pe ee if Emin s E < Emax then v = (-1)82(1.A 
Denormalized | if E = Emig1 and f # 0, then v= (—1)S2©™"(0.1 
number = Emin~' 4 SOE Ay ve 


+( (Zero) if E = E,pjn-1 and f = 0, then v = (-1)80 


NaN (Not a Number) 


The IEEE754 specifies a floating-point value called NaN (Not a Number). This 
is not a numeric value and therefore, is not greater or smaller than anything. 


For all floating-point formats, if v is NaN, the most-significant bit of f determines 
whether the value is a signaling or quiet NaN: v is a signaling NaN if the most- 
significant bit of fis set, otherwise, v is a quiet NaN. Table 7-5 defines the values 


for the format parameters. 


Table 7-5 Floating-Point Format Parameter Values 
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Parameter pains 
Single Double 

Emax +127 +1023 
Emin -126 —1022 
Exponent bias +127 +1023 
Exponent width in bits 8 11 
Integer bit hidden hidden 
Fraction width in bits 24 53 
Format width in bits 32 64 


Floating-Point Operations 


The minimum and maximum values that can be expressed in this floating-point 


format are shown in Table 7-6. 


Table 7-6 Minimum and Maximum Floating-Point Values 


Type 


Value 


Single-precision floating-point Minimum 


1.40129846e> 


Single-precision floating-point Minimum 
(Normal) 


Single-precision floating-point Maximum 


1.17549435e°8 


3.40282347et38 


Double-precision floating-point Minimum 


4.9406564584124654e °24 


Double-precision floating-point Minimum 
(Normal) 


Double-precision floating-point Maximum 
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2.2250738585072014e 208 


1.797693 1348623 157et398 
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7.4 Fixed-Point Format 


Fixed-point values are held in 2’s complement format. Unsigned fixed-point 
values are not directly provided by the floating-point instruction set. Figure 7-7 
illustrates 32-bit fixed-point format and Figure 7-8 illustrates 64-bit fixed-point 


format. 
31 30 0 
Ss 
Sign Integer 
1 31 
s : sign bit 


i : integer value (2’s complement) 


Figure 7-7 32-Bit Fixed-Point Format 


63 62 0 
Ss 

Sign Integer 
1 63 

s : sign bit 


i : integer value (2’s complement) 


Figure 7-8 64-Bit Fixed-Point Format 
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7.5 FPU Set Overview 


All FPU instructions are 32 bits long, aligned on a word boundary. They can be 
divided into the following groups: 


¢ lLoad/Store/Transfer instructions move data between the FPU 
General Purpose register, Control register, CPU, and memory. 


* Conversion instructions perform conversion operations between the 
various data formats. 


¢ Computational instructions perform arithmetic operations on 
floating-point values in FPU registers. 


¢ Compare instructions perform comparisons of the contents of 
registers and set the results to a condition bit of the FCR3/. 


¢ FPU Branch instructions perform a branch to the specified target if 
the specified coprocessor condition is met. 


For details of each instruction, refer to Chapter 17 FPU Instruction Set Details. 


7.5.1 Floating-Point Load/Store/Transfer Instructions 


Loads/Stores from/to CP1 and Memory 


Loads/Stores from/to CP1 and memory are accomplished by using one of the 
following instructions: 


¢ Load Word To Coprocessor 1 (LWC1) or Store Word From 
Coprocessor 1 (SWC1) instructions, which reference a single 32-bit 
word of the FP general registers 


¢ Load Doubleword (LDC1) or Store Doubleword (SDC1) instructions, 


which reference a 64-bit doubleword. 


These load and store operations are unformatted; no format conversions are 
performed and therefore no floating-point exceptions can occur due to these 
operations. 
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Transfers Between CP1 and CPU 
Data can also be moved directly between CP] General Purpose registers and the 
CPU by using one of the following instructions: 
¢ Move To Coprocessor 1 (MTC1) 
e Move From Coprocessor 1 (MFC1) 
¢ Doubleword Move To Coprocessor | (DMTC1) 
¢ Doubleword Move From Coprocessor | (DMFC1) 


Like the floating-point load and store operations, these operations perform no 
format conversions and never cause floating-point exceptions. 


Data transfer between CP 1 control registers and the CPU is accomplished with the 
following instructions: 


¢ Move Control Word To Coprocessor | (CTC1) 
¢ Move Control Word From Coprocessor 1 (CFC1) 


Load Delay and Hardware Interlocks 


The instruction immediately following a load or a MTC! can use the contents of 
the loaded register. In such cases the hardware interlocks, requiring additional 
real cycles; for this reason, scheduling load delay slots is desirable to avoid the 
interlocks. 


Data Alignment 


All coprocessor loads and stores reference the following aligned data items: 


¢ For word loads and stores, the access type is always WORD, and the 
low-order 2 bits of the address must always be 0. 


e For doubleword loads and stores, the access type is always 
DOUBLEWORD, and the low-order 3 bits of the address must 
always be 0. 


Endianness 


Regardless of byte-numbering order (endianness) of the data, the address specifies 
the byte that has the smallest byte address in the addressed field. For a big-endian 
system, it is the leftmost byte; for a little-endian system, it is the rightmost byte. 


Table 7-7 lists load, store, and transfer instructions. 
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Table 7-7 Load/Store/Transfer Instructions 


Instruction Format and Description; op | base ft offset | 

Load Word To LWC‘1 ft, offset (base) 

FPU Sign-extends the 16-bit offset and adds it to the CPU register base to generate 
an address. Loads the contents of the word specified by the address to the 
FPU general purpose register ft. 

Store Word From | SWC1 ft, offset (base) 

FPU Sign-extends the 16-bit offset and adds it to the CPU register base to generate 
an address. Stores the contents of the FPU general purpose register ft to the 
memory position specified by the address. 

Load LDC1 ft, offset (base) 

Doubleword To Sign-extends the 16-bit offset and adds it to the CPU register base to generate 

FPU an address. Loads the contents of the doubleword specified by the address to 
the FPU general purpose registers ft and ft+1 when FR = 0, or to the FPU 
general purpose register ft when FR = 1. 

Store SDC1 ft, offset (base) 

Doubleword Sign-extends the 16-bit offset and adds it to the CPU register base to generate 

From FPU an address. Stores the contents of the FPU general purpose registers ft and 


ft+1 to the memory position specified by the address when FR = 0, and the 
contents of the FPU general purpose register ft when FR = 1. 


Instruction Format and Description| COP1 | sub rt fs 0 
Move Word To MTC1 rt, fs 
FPU Transfers the contents of CPU general purpose register rt to FPU general 
purpose register fs. 
Move WordFrom | MFC1 rt, ft 
FPU Transfers the contents of FPU general purpose register fs to CPU general 


purpose register rt. 


Move Control 
Word To FPU 


CTC1 rt, fs 
Transfers the contents of CPU general purpose register rt to FPU control 
register fs. 


Move Control 


CFC‘1 rt, fs 


Word From FPU_ | Transfers the contents of FPU control register fs to CPU general purpose 
register rt. 

Doubleword DMTC1 rt, fs 

Move To FPU Transfers the contents of CPU general purpose register rt to FPU general 
purpose register fs. 

Doubleword DMFC1 rt, fs 

Move From FPU_ | Transfers the contents of FPU general purpose register fs to CPU general 


purpose register rt. 
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7.5.2 Convert Instructions 


Convert instructions perform conversions between the various data formats such 
as single- or double-precision, fixed- or floating-point formats. Table 7-8 lists 
conversion instructions. 


When converting a long integer to a single- or double-precision floating-point 

number (CVT. [S,D]. L), bits 63:55 of the 64-bit integer must be all zeroes or ones, 
otherwise the Vp4300 processor raises a floating-point instruction exception. The 
floating-point instruction exception allows these cases to be handled by software. 


Table 7-8 Convert Instruction (1/2) 


Instruction 


COP1 fmt 0 fs fd funct 


Format and Description 


Floating-point 
Convert To 
Single Floating- 
point Format 


CVT.S.fmt fd, fs 

Converts the contents of floating-point register fs from the specified format 
(fmt) to a single-precision floating-point format. Stores the rounded result to 
floating-point register fd. 


Floating-point 
Convert To 
Double Floating- 
point Format 


CVT.D.fmt fd, fs 

Converts the contents of floating-point register fs from the specified format 
(fmt) to a double-precision floating-point format. Stores the rounded result 
to floating-point register fd. 


Floating-point 
Convert To Long 
Fixed-point 
Format 


CVT.L.fmt fd, fs 

Converts the contents of floating-point register fs from the specified format 
(fmt) to a 64-bit fixed-point format. Stores the rounded result to floating- 
point register fd. 


Floating-point 
Convert To 
Single Fixed- 
point Format 


CVT.W.fmt fd, fs 

Converts the contents of floating-point register fs from the specified format 
(fmt) to a 32-bit fixed-point format. Stores the rounded result to floating- 
point register fd. 


Floating-point 
Round To Long 
Fixed-point 
Format 


ROUND.L.fmt fd, fs 

Rounds the contents of floating-point register fs to a value closest to the 64- 
bit fixed-point format and converts them from the specified format (fmt). 
Stores the result to floating-point register fd. 


Floating-point 
Round To Single 
Fixed-point 
Format 


ROUND.W.fmt fd, fs 

Rounds the contents of floating-point register fs to a value closest to the 32- 
bit fixed-point format and converts them from the specified format (fmt). 
Stores the result to floating-point register fd. 


Floating-point 
Truncate To Long 
Fixed-point 
Format 
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TRUNC.L.fmt fd, fs 

Rounds the contents of floating-point register fs toward 0 and converts them 
from the specified format (fmt) to a 64-bit fixed-point format. Stores the 
result to floating-point register fd. 
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Table 7-8 Convert Instruction (2/2) 


Instruction 


COP1 | _ fmt 0 fs fd 


funct | 


Format and Description 


Floating-point 
Truncate To 
Single Fixed- 
point Format 


TRUNC.W.fmt fd, fs 

Rounds the contents of floating-point register fs toward 0 and converts them 
from the specified format (fmt) to a 32-bit fixed-point format. Stores the 
result to floating-point register fd. 


Floating-point 
Ceiling To Long 
Fixed-point 
Format 


CEIL.L.fmt fd, fs 

Rounds the contents of floating-point register fs toward +90 and converts 
them from the specified format (fmt) to a 64-bit fixed-point format. Stores 
the result to floating-point register fd. 


Floating-point 
Ceiling To Single 
Fixed-point 
Format 


CEIL.W.fmt fd,fs 

Rounds the contents of floating-point register fs toward +90 and converts 
them from the specified format (fmt) to a 32-bit fixed-point format. Stores 
the result to floating-point register fd. 


Floating-point 
Floor To Long 
Fixed-point 
Format 


FLOOR.L.fmt fd, fs 

Rounds the contents of floating-point register fs toward -« and converts them 
from the specified format (fmt) to a 64-bit fixed-point format. Stores the 
result to floating-point register fd. 


Floating-point 
Floor To Single 
Fixed-point 
Format 


FLOOR.W.fmt fd, fs 

Rounds the contents of floating-point register fs toward -« and converts them 
from the specified format (fmt) to a 32-bit fixed-point format. Stores the 
result to floating-point register fd. 
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7.5.3, Computational Instructions 


Computational instructions perform arithmetic operations on floating-point 
values, in registers. Table 7-9 lists the computational instructions. There are two 
categories of computational instructions: 


¢ 3-Operand Register-Type instructions, which perform floating-point 
add, subtract, multiply, and divide operations 


e 2-Operand Register-Type instructions, which perform floating-point 
absolute value, transfer, square root, and negate operations. 


Table 7-9 Computational Instructions 


Instruction 


Format and Description| COP1 | fmt ft fs fd 


funct | 


Floating-point 
Add 


ADD.fmt fd, fs, ft 
Arithmetically adds the contents of floating-point registers fs and ft in the 
specified format (fmt). Stores the rounded result to floating-point register fd. 


Floating-point 
Subtract 


SUB.fmt fd, fs, ft 

Arithmetically subtracts the contents of floating-point registers fs and ft in 
the specified format (fmt). Stores the rounded result to floating-point register 
fd. 


Floating-point 
Multiply 


Floating-point 
Divide 


MUL.fmt fd, fs, ft 

Arithmetically multiplies the contents of floating-point registers fs and ft in 
the specified format (fmt). Stores the rounded result to floating-point register 
fd. 


DIV.fmt fd, fs, ft 
Arithmetically divides the contents of floating-point registers fs and ft in the 
specified format (fmt). Stores the rounded result to floating-point register fd. 


Floating-point 
Absolute Value 


ABS.fmt fd, fs 

Calculates the arithmetic absolute value of the contents of floating-point 
register fs in the specified format (fmt). Stores the result to floating-point 
register fd. 


Floating-point 
Move 


Floating-point 
Negate 


MOV.fmt fd, fs 

Copies the contents of floating-point register fs to floating-point register fd 
in the specified format (fmt). 

NEG.fmt fd, fs 

Arithmetically negates the contents of floating-point register fs in the 
specified format (fmt). Stores the result to floating-point register fd. 


Floating-point 
Square Root 
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SQRT.fmt fd, fs 

Calculates arithmetic positive square root of the contents of floating-point 
register fs in the specified format. Stores the rounded result to floating-point 
register fd. 
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fmt appended to the instruction op code of the arithmetic operation and compare 
instruction indicates the data format. S indicates the single-precision floating 
decimal point, D indicates the double-precision floating decimal point, L indicates 
the 64-bit fixed decimal point, and W indicates the 32-bit fixed decimal point. For 
example, “ADD.D” means that the operand of the addition instruction is a double- 
precision floating-point value. 


If the FR bit is 0, an odd-numbered register cannot be specified. 


7.5.4 Compare Instructions 


The floating-point compare (C.cond.fmt) instructions interpret the contents of two 
FPU registers (fs, ft) in the specified format (fmf) and arithmetically compare 
them. A result is determined based on the comparison and conditions (cond) 
specified in the instruction. Table 7-10 lists the compare instructions. Table 7-11 
lists the mnemonics for the compare instruction conditions. 


Table 7-10 Compare Instruction 


Instruction 


Format and Description| COP1 |_ fmt ft fs 0 funct | 


Floating-point 
Compare 


C.cond.fmt fs, ft 

Interprets and arithmetically compares the contents of FPU registers fs and ft 
in the specified format (fmt). The result is identified by comparison and the 
specified condition (cond). After a delay of one instruction, the comparison 
result can be used by the FPU branch instruction of the CPU. 
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Table 7-11 Mnemonics and Definitions of Compare Instruction Conditions 
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Mnemonic Definition Mnemonic Definition 
T True F False 
UN Unordered OR Ordered 
EQ Equal NEQ Not Equal 
UEQ Unordered or Equal OLG inn en Aha OF 
OLT Ordered Less Than UGE Se GE oneal tna oe 
ULT Unordered or Less Than OGE Ordered Greater Than or Equal 
OLE Ordered Less Than or Equal UGT Unordered or Greater Than 
ULE Unordered or Less Than or Equal OGT Ordered Greater Than 
SF Signaling False ST Signaling True 
NGLE a oo Than or Less Than or GLE og Than, or Less Than or 
SEQ Signaling Equal SNE Signaling Not Equal 
NGL Not Greater Than or Less Than GL Greater Than or Less Than 
LT Less Than NLT Not Less Than 
NGE Not Greater Than or Equal GE Greater Than or Equal 
LE Less Than or Equal NLE Not Less Than or Equal 
NGT Not Greater Than GT Greater Than 
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7.5.5 FPU Branch Instructions 


Table 7-12 lists the FPU branch instructions. These instructions can be used to 
test the result of the compare (C.cond.fmt) instruction. The delay slot in this table 
indicates the instruction that immediately follows a branch instruction. For 
details, refer to Chapter 4 Pipeline. 


Table 7-12. FPU Branch Instructions 


Instruction 


Format and Description[ COP1 | BC | br offset | 


Branch On FPU 
True 


BC1T offset 
Adds the instruction address in the delay slot and a 16-bit offset (shifted 2 bits 


to the left and sign-extended) to calculate the branch target address. 
If the FPU condition line is true, branches to the target address (delay of one 
instruction). 


Branch On FPU 
False 


BC1F offset 

Adds the instruction address in the delay slot and a 16-bit offset (shifted 2 bits 
to the left and sign-extended) to calculate the branch target address. 

If the FPU condition line is false, branches to the target address (delay of one 
instruction). 


Branch On FPU 
True Likely 


BC1TL offset 

Adds the instruction address in the delay slot and a 16-bit offset (shifted 2 bits 
to the left and sign-extended) to calculate the branch target address. 

If the FPU condition line is true, branches to the target address (delay of one 
instruction). If conditional branch does not take place, the instruction in the 
delay slot is invalidated. 


Branch On FPU 
False Likely 


BC1FL offset 

Adds the instruction address in the delay slot and a 16-bit offset (shifted 2 bits 
to the left and sign-extended) to calculate the branch target address. 

If the FPU condition line is false, branches to the target address (delay of one 
instruction). If conditional branch does not take place, the instruction in the 
delay slot is invalidated. 
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7.5.6 FPU Instruction Execution Time 


Unlike the CPU, which executes almost all instructions in a single cycle, more 
time must be used to execute FPU instructions. 


All data transfer between the floating-point and memory is accomplished by 
coprocessor load and store operations. Data may be directly moved between the 
floating-point coprocessor and the integer processor by load to and load from 
coprocessor instructions as shown below: 


Table 7-13 Number of Load/Store/Transfer Instruction Execution Cycles 


Instruction Cycles 
LWC1 2/1* 
SWCl 1 
LDC1 2/1* 
SDC1 1 
MTC1 1 
MFC1 1 
DMTC1 1 
DMFC1 1 
CTC1 1 
CFC1 1 
* The hardware interlocks for one cycle if the load result is used by the instruction in the 
load delay slot. 


To obtain optimum performance, the Vp4300 pipeline does not perform a bypass 
from EX to EX stage of the next instruction for the floating-point result of a 
compare, computational, LWC1, or LDC1 instruction. If the subsequent EX- 
stage floating-point instruction depends on the result of the current EX-stage 
floating-point instruction, the current floating-point instruction completes and its 
EX-stage result is registered in the DC stage and the bypass is enabled. 
Meanwhile, the RF-stage floating-point instruction advances to the EX-stage, 
where it is stalled for one pipeline clock to wait for the result to be bypassed from 
DC to EX, before it begins execution. 

Caution This limitation on bypass from EX to EX stage of the next 
instruction does not apply to integer operations nor to float- 
ing-point load/store/transfer instructions (except LWC1 and 
LDC1). 


230 User’s Manual U10504EJ7VOUMOO 


Floating-Point Operations 


| Run | Run | eee Stall eee | Run | Run 
FP #1 IC | RF | EX | EX | EX | EX 
No Bypass allowed ——__ Bypass 


Figure 7-9 DC-to-EX Hardware Interlock Bypass 


The execution unit of the Vp4300 can shorten the delay time of almost all the 
floating-point instructions depending on the circumstances. By using this feature, 
the performance can be improved and design can be simplified. Changes in the 
delay time are simplified as much as possible. If occurrence of an exception is 
detected by checking the source operand when a multicycle instruction is executed 
(if a source exception occurs), this multicycle instruction is executed for only 2 
cycles, and exception processing is started. Similarly, if the result of an operation 
is found to be the value that does not cause an exception (zero or infinite) as a 
result of checking the operand, the result (e.g., a value other than ©x(Q) is written 
back 2 cycles after, and the operation ends. 


Floating-point exceptions, except the source exception, are not aborted until 
instruction execution is completed. In other words, an exception is reported not 
when it has been found, but when instruction execution has been completed. 


Next, the execution time of each instruction is described. 


Floating-point Add/Subtract Instructions 


Floating point add and subtract terminate on the second cycle if a source exception 
occurs, or if at least one operand is zero or infinity. The instruction completes on 
the third cycle in all other cases. 
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Floating-point Multiply Instruction 


A floating point multiply completes in two cycles if a source exception is detected, 
or if, during the first cycle, the result can be determined to be zero or infinity. A 
floating-point multiply also finishes in the second cycle if at least one of the 
operands is a power of 2. In all other cases it takes the full number (the maximum 
specified for each format) of cycles to complete. Thus, multiply does not finish 
as soon as the remaining bits are zero. Also, there can be no overlap between 
multiply and add. 


Floating-point Divide/Square Root Instructions 


Floating Point divide and square root complete in the second cycle on either a 
source exception or if, during the first cycle, the result can be determined to be 
either zero or infinity. Otherwise they continue, taking the maximum amount of 
cycles. 


Floating-point Convert Instruction 


Floating-point convert instructions also complete in the second cycle for trivial 
cases. 


Execution cycle numbers of floating-point instructions are listed in Table 7-14. If a 
floating-point result for these instructions is needed by the subsequent instruction, the 
latency is the execution rate plus one, due to the fact that an EX-to-RF bypass is not 
performed for the results of these instructions. All CPU/FPU instruction delay times that 
are not mentioned in these tables have a latency of one pipeline clock cycle (1PClock). 
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7.6 FPU Pipeline Synchronization 


Since the integer and floating-point units share a common hardware pipeline, a 
CFC1 instruction is not needed to synchronize the pipeline operation. 


Table 7-14. Number of FPU Instruction Delay Cycles"! 


Pipeline Cycles a 
S D WwW L 


Instruction 


Add.fmt 
Sub.fmt 
Mul.fmt 
Div.fmt 29 58 
Sqrt.fmt 29 58 
Abs.fmt 
Mov.fmt 
Neg.fmt 
Round. W.fmt 
Trunc.W.fmt 
Ceil.W.fmt 
Floor.W.fmt 
Round.L.fmt 
Trunc.L.fmt 
Ceil.L.fmt 
Floor.L.fmt 
Cvt.S.fmt 
Cvt.D.fmt 
Cvt.W.fmt 
Cvt.L.fmt 
C.cond.fmt 


WM] GW} Go 
CO] WI] Lo 


Mn} a} Gru] Quy Guy Quy Grp repre |e 


DN] ay} ay a} a} a} Quy ny ny ep ep ee 


Relninis 


ele ell elniniRe| i 


*1_ Ifthe result of a floating-point instruction is needed by the subsequent 
instruction, one additional pipeline clock is required to perform a 
hardware interlock bypass. 

*2. The multicycle floating-point operation instructions whose results are 
obvious are not described in this table; it takes two pipeline clocks to 
complete. 

*3. The architecturally defined branch delay slot of one cycle also applies 
to all FPU branch instructions. 
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This chapter explains how the FPU handles the floating-point exception. 
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8.1 Types of Exceptions 


The floating-point exception occurs if a floating-point operation or the result of 
the operation cannot be handled by the ordinary method. 
The FPU performs either of the following two operations in case of an exception. 


¢ When exception is enabled 
Sets the Cause bit of the Control/Status register (FCR31) of the FPU, 
and transfers servicing to the exception handler routine (software 
servicing). 


¢ When exception is disabled 
Stores an appropriate value (default value) to the Destination register 
of the FPU, sets the Cause bit and flag bit of FCR3/, and continues 
execution. 


The FPU supports the five IEEE754 exceptions: 
e  Inexact (1) 
¢ Overflow (O) 
¢ Underflow (U) 
e Division by Zero (Z) 
e Invalid Operation (V) 
Cause bits, Enable bits, and Flag bits (Status flags) are used. 


FPU has an unimplemented operation (E) as the sixth exception cause, which is 
used when the floating-point operation cannot be executed with the standard 
MIPS architecture (including when the FPU cannot correctly process exceptions). 
This exception requires service by the software. The E bit does not exit in the 
Enable or Flag bit. When this exception occurs, unimplemented exception 
processing is executed (when interrupt input by the FPU to the CPU is enabled). 


Figure 8-1 shows the bits of the FCR3/ used to support the exception. 


Remark The unimplemented operation exception is defined by the IEEE754 
standard. With the Vp4300, however, this is an exception that occurs 
if an operation not supported by the hardware is executed. 
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Bit# 17 16 15 14 13 12 
Cause 
E V Z O U l | Bits 
| | | | | 
Bit # 11 10 9 8 7 
Enable 
V Z O U l | Bits 
| | | | | 
Bit #6 5 4 3 2 
Flag 
V Z O U | Bits 


Inexact Operation 
Underflow 
Overflow 
Division by Zero 
Invalid Operation 
Unimplemented Operation 


Figure 8-1 FCR3I Cause/Enable/Flag Bits 


The five exceptions (V, Z, O, U, and I) of the IEEE754 are enabled when the 
Enable bit is set. When an exception occurs, the corresponding Cause bit is set. 
If the corresponding Enable bit is set, the FPU generates an interrupt to the CPU, 
and starts exception processing. If occurrence of the exception is disabled, the 
Cause and Flag bits corresponding to the exception are set. 


8.2 Exception Processing 


When a floating-point exception is taken, the Cause register of the CPO indicates 
the FPU is the cause of the exception. The Floating-Point Exception (FPE) code 
is used, and the Cause bits of the FCR31 indicate the reason for the floating-point 
exception. These bits are, in effect, an extension of the CPO Cause register. 
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8.2.1 Flags 


Flag bits corresponding to the respective IEEE754 exceptions are provided. The 
Flag bit is set when occurrence of the corresponding exception is disabled and 
when the condition of the exception is detected. The flag bit can be reset by 
writing a new value to the Status register by using the CTC! instruction. 


If an exception is disabled by the corresponding Enable bit, the FPU performs 
predetermined processing. This processing gives the default value as the result, 
instead of the result of the floating-point operation. This default value is 
determined by the type of the exception. In the case of the overflow and 
underflow exceptions, the default value differs depending on the rounding mode 
used at that time. Table 8-1 shows the default values to be given by the respective 
TEEE754 exceptions of the FPU. 


Table 8-1 Default FPU IEEE754 Exception Values 


Field Description MounCInE Default Values 
Mode 
Vv Invalid operation - Supply a Quiet Not a Number (Q-NaN) 
Z Division by zero - Supply a properly signed % 
RN co signed with intermediate result 
RZ Maximum normal number signed with 
intermediate result 
Negative overflow: maximum negative normal 
O Overflow RP number 


Positive overflow: +90 


Positive overflow: maximum positive normal 
RM number 
Negative overflow: -9 


RN 0 signed with intermediate result 


RZ 0 signed with intermediate result 
Positive underflow: minimum positive normal 


U Underflow RP number 
Negative underflow: 0 


Negative underflow: minimum negative 
RM normal number 
Positive underflow: 0 


I Inexact exception - Supply a rounded result 
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The FPU detects the nine exception causes internally. When the FPU detects one 
of these unusual situations, it causes either an IEEE754 exception or an 


unimplemented operation exception (E). Table 8-2 lists the exception-causing 
situations and compares the contents of the Cause bits of the FPU with the 
TEEE754 standard when each exception occurs. 


Table 8-2. FPU Internal Results and Flag Status 


FPU Internal Exception|Exception 

Result PERE} A Enable | Disable Remarks 
Inexact result I I I Loss of accuracy 
Exponent overflow ot O,I O,I Normalized exponent > Eyax 
Division by zero Zz Z a is (exponent = E,,in-1, mantissa 
Sa SCONE? Vv E E Source out of integer range 
integer 
Signaling NaN 
(S-NaN) source ¥ ¥ ¥ 
Invalid operation Vv Vv V4 0/0, etc. 
Exponent underflow U E U,I Normalized exponent < Ejin 
Denormalized source None E E Ey ie SOs emis ne 

mantissa = 0 

Q-NaN None E E 


*1. With the IEEE754, the inexact operation exception occurs only if an 


a 


overflow occurs only when the overflow exception is disabled. 
However, the Vp4300 always generates the overflow exception and 
inexact operation exception when an overflow occurs. 


If both the underflow exception and inexact operation exception are 
disabled when the exponent underflow occurs, and if the FS bit of 
FCR31 is set, the Cause bit and Flag bit of the underflow exception and 
inexact operation exception are set. Otherwise, the Cause bit of the 
unimplemented operation exception is set. 


Next, each FPU exception is described. 
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8.2.2 Inexact Exception (I) 


The FPU generates the inexact operation exception in the following cases. 
e If the accuracy of the rounded result drops 
¢ If the rounded result overflows 


e If the rounded result underflows and if the FS bit of FCR3/ is set 
with the underflow and illegal operation exceptions disabled 


If Exception Is Enabled: 


The Destination register is not modified, the Source registers are preserved and an 
Inexact Operation exception occurs. 


If Exception Is Not Enabled: 


The rounded result or underflowed/overflowed result is delivered to the 
Destination register if no other exception occurs. 


8.2.3 Invalid Operation Exception (V) 


The Invalid Operation exception is generated if one or both of the operands are 
invalid. When the exception is not enabled, the MIPS ISA defines the result as a 
Quiet Not a Number (Q-NaN). The invalid operations are: 


e Add or subtract: Add and Subtract of infinities, such as: 
(+e )+(-%)or(—-% )-(-@) 


e Multiply: + 0 x + © 
¢ Divide: +0++0,0ort+0++0 


¢ Compare of predicates involving < or > without ?, when the operands 
are unordered 


e Any arithmetic operation, when one or both operands is a S-NaN. A 
transfer (MOV) operation is not considered to be an arithmetic 
operation, but absolute value (ABS) and negate (NEG) are. 


¢ Compare or convert to floating-point operation when the operand is 
S-NaN. 


e Square root: lx , where x is less than zero. 
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Software can simulate the Invalid Operation exception for other operations that 
are invalid for the given source operands. Examples of these operations include 
IEEE754-specified functions implemented in software, such as Remainder x REM 
y, where y is 0 or x is infinite; conversion of a floating-point number to a decimal 
format whose value causes an overflow, is infinity, or is NaN; and transcendental 
functions, such as In (—5) or cos /(3). Refer to Chapter 17 FPU Instruction Set 
Details. Refer to Appendix B for examples or for routines to handle these cases. 
If Exception Is Enabled: 

The Destination register is not modified, the Source registers are preserved, and 
the Invalid Operation Exception occurs. 


If Exception Is Not Enabled: 


If any other exception does not occur, Q-NaN is stored to the Destination register. 


8.2.4 Divide-by-Zero Exception (Z) 


The Division-by-Zero exception occurs if the divisor is zero and the dividend is a 
finite nonzero number. This exception occurs due to other operations that produce 
a signed infinity, such as In(0), sec(a/2) or Qt. 


If Exception Is Enabled: 


The contents of the Destination register are not changed, the contents of the 
Source register are preserved, and the zero division exception occurs. 


If Exception Is Not Enabled: 


If any other exception does not occur, the infinite number (+) determined by the 
sign of the operand is stored to the Destination register. 
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8.2.5 Overflow Exception (O) 


The Overflow exception occurs when the magnitude of the rounded floating-point 
result, with an unbounded exponent range, is larger than the largest finite number 
of the destination format. (An Inexact exception and Flag bit is set.) 


If Exception Is Enabled: 


The contents of the Destination register is not modified, and the Source registers 
are preserved, and the overflow exception occurs. 


If Exception Is Not Enabled: 


If any other exception does not occur, the default value determined by the 
rounding mode is stored to the Destination register (refer to Table 8-1 Default 
FPU IEEE754 Exception Values). 


8.2.6 Underflow Exception (U) 


Two related events generate the Underflow exception: 
¢ If the operation result is —2Emin ¢> +2Emin (other than 0) 


e extraordinary loss of accuracy during the arithmetic operation of such 
tiny numbers by denormalized numbers. 


The IEEE754 provides several methods of underflow detection. Note, however, 
that the same detection method must be used for any processing. 


The following two methods are used to detect an underflow. 


e after rounding (when a nonzero result, computed as though the 
exponent range were unbounded, would lie strictly between +2°™") 


¢ before rounding (when a nonzero result, computed as though the 
exponent range and the precision were unbounded, would lie strictly 
between +22™0)_ 


The MIPS architecture detects an underflow after rounding. 


To detect a drop in the accuracy, the following two methods are used. 


e Denormalize loss (if a given result differs from the result calculated 
when the exponent range is infinite) 


e  Inexact result (if a given result differs from the result calculated when 
the exponent range and accuracy are infinite) 


The MIPS architecture detects a drop in the accuracy as an inexact result. 
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If Exception Is Enabled: 


If the underflow exception or inexact operation exception is enabled, or if the F'S 
bit of the FCR3/ register is not set, the unimplemented operation exception (E) 
occurs. At this time, the contents of the destination register are not changed. 


If Exception Is Not Enabled: 


If the underflow exception and inexact operation exception are disabled, and if the 
FS bit of the FCR3/ register are set, the default value determined by the rounding 
mode is stored to the Destination register (refer to Table 8-1 Default FPU 
TEEE754 Exception Values). 


8.2.7 Unimplemented Operation Exception (E) 


If an attempt is made to execute an instruction of an operation code or format code 
reserved for future expansion, the E bit is set and an exception occurs. The 
operand and the contents of the Destination register are not changed. Usually, 
instructions are emulated by software. If the IEEE754 exceptions occur from an 
emulated operation, simulate those exceptions. 


The unimplemented operation exception also occurs in the following cases. These 
are cases where an abnormal operand that cannot be handled correctly by 
hardware, or an abnormal result is detected. 


e If the operand is a denormalized number (except compare instruction) 
e If the operand is Q-NaN (except compare instruction) 


e If the result is a denormalized number or underflows when the 
underflow/inexact operation exception is enabled and when the FS bit 
of the FCR31 register is set 


e If areserved instruction is executed 
e If aunimplemented format is used 


¢ If a format whose operation is invalid is used (e.g., CVT.S.S) 


Caution If the type conversion or arithmetic operation instruction is 
executed and if the operand is a denormalized number or NaN, the 
exception occurs. The exception does not occur even if the operand 
is a denormalized number of NaN when the transfer instruction is 
executed. 


How to use the unimplemented operation exception is arbitrarily determined by 
the system. To maintain complete compatibility with the IEEE754, the 
unimplemented operation exception can be handled by software if occurs. 
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If Exception Is Enabled: 
The contents of the Destination register are not changed, the contents of the source 
register are preserved, and the unimplemented operation exception occurs. 

If Exception Is Not Enabled: 


This exception cannot be disabled because there is no corresponding Enable bit. 


Restrictions: 


An unimplemented operation exception will occur in response to the execution of 
a type conversion instruction in the following cases. 


e If an overflow occurs during conversion to integer format 
e If the source operand is an infinite number 


e If the source operand is NaN 


The type conversion instructions affected by this restriction are as follows. 


CEIL.L.fmt — fd, fs FLOOR.L.fmt fd, fs 
CEIL.W.fmt fd, fs FLOOR.W.fmt _ fd, fs 
CVT.D.fmt fd, fs ROUND.L.fmt — fd, fs 
CVT.L.fmt — fd, fs ROUND.W.fmt fd, fs 
CVT.S.fmt — fd, fs TRUNC.L.fmt _ fd, fs 
CVT.W.fmt fd, fs TRUNC.W.fmt — fd, fs 


8.3 Saving and Returning State 


Sixteen doubleword’ LDC1 or SDC1 operations save or return the coprocessor 
floating-point register state in memory. The information in the Control and Status 
register can be saved or returned to the CPU register through CFC1 and CTC1 
instructions. Normally, the Control/Status register is saved first and returned last. 


When state is returned, state information in the Control/Status register indicates 
the exceptions that are pending. 


Writing a zero value to the Cause field of FCR31 register clears all pending 
exceptions, permitting normal processing to restart after the floating-point register 
state is returned. 


* 32 doublewords if the FR bit is set to 1. 
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8.4 Handling of IEEE754 Exceptions 


The IEEE754 recommends the exception handler for any of the five standard 
exceptions; the exception handler can compute and restore a substitute result in 
the Destination register. 


By retrieving an instruction using the processor Exception Program Counter 
(EPC) register, the exception handler determines: 


* exceptions occurring during the operation 
¢ the operation being performed 


e the destination format 


To obtain the correct rounded result if the overflow, underflow (except when the 
conversion instruction is executed), or inexact operation exception occurs, 
develop software that checks the Source register or that simulates the instructions 
while an exception handler is executed. 


On Invalid Operation and Divide-by-Zero exceptions, conversions, and on 
Overflow or Underflow exceptions occurred on floating-point, the exception 
handler gains access to the operand values by examining the Source registers of 
the instruction. 


The IEEE754 recommends that, if enabled, the overflow and underflow 
exceptions take precedence over a separate inexact exception. This prioritization 
is accomplished in software; hardware sets the bits for both the Overflow or 
Underflow exception and the Inexact exception. 
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This chapter describes the Vp4300 Initialization interface, and the processor 
modes. This includes the reset signal description and types, and initialization 
sequence, with signals and timing dependencies, and the user-selectable Vp4300 
processor modes. 


User’s Manual U10504EJ7VOUMOO 247 


Chapter 9 


9.1 Functional Overview 
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The Vp4300 processor has the following three types of resets; they use the 


ColdReset and Reset signals. 


Power-ON Reset: When the ColdReset signal is asserted active after 
the power is applied and has become stable all clocks are restarted. 
A Power-ON Reset completely initializes the internal state of the 
processor without saving any state information. 


Cold Reset: When the ColdReset signal is asserted active while the 
processor is operating all clocks are restarted. A Cold Reset 
completely initializes the internal state of the processor without saving 
any state information. 


Soft Reset: restarts processor, but does not affect clocks. The major 
part of the initial status of the processor can be retained by using soft 
reset. 


After reset, the processor is bus master and drives the SysAD(31:0) bus. 


Care must be taken to coordinate system reset with other system elements. In 
general, bus errors immediately before, during, or after a reset may result in 
undefined operations. Since the initialization of the internal state by a reset of the 
VR4300 processor is performed only for some parts, make sure to completely 
initialize the processor through software. 


The operation of each type of reset is described in sections that follow. Refer to 
Figures 9-1 to 9-3 later in this chapter for timing diagrams of the Power-ON, 
Cold, and Soft Resets. 
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9.2 Reset Signal Description 


This section describes the two reset signals, ColdReset and Reset. 


ColdReset signal 


The ColdReset signal must be asserted active to initialize the processor using 
Power-ON Reset or Cold Reset. At this time, the RESET signal can be asserted 
active or inactive. Set DivMode (1:0)* before the Power-ON Reset. 


Do not deassert the ColdReset signal inactive at least for 64000 MasterClock 
Cycles after the signal has been asserted active. The ColdReset signal may be 
controlled not in synchronization with the MasterClock. When the ColdReset 
signal is deasserted inactive, the SClock, TClock, and SyncOut clock signals 
start operating in synchronization with the MasterClock. 


* Tn Vp4300 and Vp4305. In Vp4310, DivMode(2:0). 


Reset signal 


Assert this pin active or inactive in synchronization with MasterClock, or keep it 
inactive at Power-ON Reset or Cold Reset. 


Assert this pin active or inactive in synchronization with MasterClock at soft 
reset. 


9.2.1 Power-ON Reset 


Power-ON Reset is used to completely reset the processor. As a result: 


¢ The 7S, SR, and RP bits of the Status register and EP (3:0) bits of the 
Config register are cleared to 0. 


¢ The ERL and REV bits of the Status register and BE bit of the Config 
register are set to 1. 


¢ The upper-limit value (31) is assigned to the Random register. 


¢ The EC (2:0) bits of the Config register are assigned to the contents 
of the DivMode (1:0)* pins. 


e All the other internal statuses are undefined. 
* Tn Vp4300 and Vp4305. In Vp4310, DivMode(2:0). 


After the power supply to the processor has stabilized after Power-ON Reset, 
assert the ColdReset signal active for the duration of 64000 MasterClock cycles 
or more (0.96 ms during external 66.7-MHz operation). 
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Determine the DivMode signal until the ColdReset signal is asserted active. The 
DivMode signal cannot be changed after that. If the DivMode signal is changed 
after the ColdReset signal has been asserted active, the operation of the processor 
is not guaranteed. 


When asserting the ColdReset signal active, the Reset signal may be active or 
inactive. However, do not change the value of the Reset signal during the reset 
sequence. 


Keep the Reset signal active for the duration of 16 MasterClock cycles 
immediately after the ColdReset signal has been deasserted inactive. 


The output signals of the system interface are as follows during the reset period. 
¢ PValid signal : 1 
¢ PReq signal : 1 
¢  PMaster signal: 0 
e SysAD (31:0) : Undefined 
¢ SysCmd (4:0) : Undefined 


When resetting has been completed, the processor serves as the bus master and 
drives SysAD (31:0). The processor branches to a reset exception vector and 
starts executing a reset exception code. 


9.2.2 Cold Reset 
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A Cold Reset is used to completely reset the processor. 


¢ the TS, SR, and RP bits of the Status register and the EP (3:0) bits of 
the Config register are cleared to 0 


e the ERL and BEV bits of the Status register and the BE bit of the 
Config register are set to 1 


¢ the value of the upper bound (31) is set to the Random register 


e all states other than above are undefined 


When executing cold reset, keep the ColdReset signal active for the duration of 
64000 MasterClock cycles or more (0.96 ms during external 66.7-MHz 
operation). 


When asserting the ColdReset signal active, the Reset signal may be active or 
inactive. However, do not change the value of the Reset signal during reset 


sequence. 
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Keep the Reset signal active for the duration of 16 MasterClock cycles 
immediately after the ColdReset signal has been deasserted inactive. 


The output signals of the system interface are as follows during the reset period. 
¢ PValid signal : 1 
¢ PReq signal : 1 
¢ PMaster signal: 0 
¢ SysAD (31:0) : Undefined 
e¢ SysCmd (4:0) : Undefined 
When resetting has been completed, the processor serves as the bus master and 


drives SysAD (31:0). The processor branches to a reset exception vector and 
starts executing a reset exception code. 


9.2.3 Soft Reset 


A Soft Reset is used to reset the processor without affecting the output clocks; in 
other words, a Soft Reset is a logic reset. In a Soft Reset, the processor retains as 
much state information as possible; all state information except for the following 
is retained: 


e the Status register BEV, SR, and ERL bits are set (to 1) 
e the Status register TS and RP bit is cleared (to 0) 


Because soft reset is executed as soon as the Reset signal has asserted active, 
undefined data remains as a result if a multicycle instruction or floating-point 
instruction such as cache miss is executed. 


Keep the Reset signal asserted active at least for the duration of 16 MasterClock 
cycles. At this time, satisfy the setup and hold times with the MasterClock. 


After the reset is completed, the processor becomes bus master and drives the 
SysAD(31:0) bus, the processor branches to the Reset exception vector and begins 
executing the reset exception code. 


If Reset signal is asserted in the middle of a SysAD(31:0) transaction, care must 
be taken to reset all external agents to avoid SysAD(31:0) bus contention. 
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* Determine the DivMode signal before the ColdReset signal is asserted active. 
In Vp4300 and Vp4305. In Vp4310, DivMode(2:0). 


Figure 9-1 Power-ON Reset 
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Figure 9-2 Cold Reset 
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Figure 9-3 Soft Reset 
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9.3, VR4300 Processor Modes 


The Vp4300 processor supports several user-selectable modes. All modes except 
DivMode are set/reset by writing to the Config register. 


9.3.1 Power Modes 


The Vp4300 supports three power modes: normal power, low power (100 MHz 
model of the Vp4300 and the Vp4305 only), and power-off. 


Normal Power Mode 


Normally the processor clock (PClock) is generated from the input clock 
(MasterClock). The frequency ratio of the PClock to the MasterClock is set by 
the DivMode(1:0)*. For the setting, refer to Table 2-2 Clock/Control Interface 
Signals. The frequency of the system interface clock (SClock) is the same as 
those of the MasterClock. 


Default state is normal clocking, and the processor returns to default state after any 
reset. 


* In Vp4300 and Vp4305. In Vp4310, DivMode(2:0). 


Low Power Mode (100 MHz model of Vp4300 and Vp4305 only) 


The user may set the processor to low power mode by setting the RP bit of the 
Status register to 1. In RP mode, the processor stalls the pipeline and goes into a 
quiescent state—the store buffers empty and all cache misses resolved. However, 
the RP mode operation is guaranteed only when the MasterClock is 40 MHz or 
more. The frequency of PClock drops to the 1/4 of the normal level. The speeds 
of SClock and TClock also drop to the 1/4 of the normal level. 


This feature reduces the power consumed by the processor chip to 25% of its 
normal value. 


Software must guarantee the proper operation of the system upon setting or 
clearing the RP bit. 
1. The functions of circuits such as the DRAM refresh counter change if the 


operating frequency changes. Therefore, write new values to the registers of 
the external agent that are directly affected by changes in frequency. 


2. Set the system interface in the inactive status. For example, execute a read 
instruction to the non-cache area, and make the write buffer empty before 
completion of the instruction execution. Then the RP bit can be set or cleared. 
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3. Make sure that the eight instructions before and after the MTCO instruction 
that sets or clears the RP bit do not generate exceptions such as cache miss 
and TLB miss. 


Power Off Mode 


Before entering power off mode, the system retains as much information as 
possible by writing the contents of the CPO, floating-point registers and the 
Program Counter to the memory. Dirty data cache lines are also written out to 
memory. 


9.3.2 Privilege Modes 


The Vp4300 supports three modes of system privilege: Kernel, Supervisor, and 
User Extended addressing. This section describes these three modes. 


Kernel Extended Addressing 


When the KX bit is set to | by the Status register, the expansion TLB miss 
exception vector is used if the TLB miss exception of the Kernel address occurs. 
In the Kernel mode, the MIPSIII instruction set can be always used regardless of 
the KX bit. 


Supervisor Extended Addressing 


If the SX bit is set to 1 by the Status register, the MIPSIT instruction set can be 
used in the supervisor mode, and the expansion TLB miss exception vector is used 
if the TLB miss exception of the supervisor address occurs. If this bit is cleared, 
the MIPSI and II instruction sets and 32-bit virtual addresses are used. 


User Extended Addressing 


If the UX bit is set to 1 by the Status register, the MIPSIII instruction set can be 
used in the User mode, and the expansion TLB miss exception vector is used if the 
TLB miss exception of the user address occurs. If this bit is cleared, the MIPSI 
and II instruction sets and 32-bit virtual addresses are used. 


9.3.3 Floating-Point Registers 


If the FR bit of the Status register is set to 1, all the thirty-two 64-bit floating-point 
registers defined by the MIPSII] architecture can be accessed. If this bit is cleared, 
the processor accesses the sixteen 64-bit floating-point registers defined by the 
MIPSII architecture. 
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9.3.4 Reverse Endianness 


If the RE bit of the Status register is set to 1, the endian in the User mode is 
reversed. 


9.3.5 Instruction Trace Support 


If the JTS bit of the Status register is set to 1, the physical address at the branch 
destination can be output from SysAD(31:0) when the instruction address is 
changed by execution of a jump or branch instruction or by occurrence of an 
exception. This function is disabled when the /TS bit is cleared. 


Use this function to forcibly generate an instruction cache miss in the following 
cases. 


e — If the branch condition is satisfied when a branch instruction is 
executed 


¢ If the contents of the PC are changed by execution of a jump 
instruction or by occurrence of an exception 


When the instruction cache miss occurs, a processor block read request is issued 
from the SysAD(31:0). This informs the change in the address to the outside. 
Return the response data to the processor block read request in the same manner 
as for a normal request. 


The address to be output is not a PC value (virtual address) but a physical address. 


9.3.6 Bootstrap Exception Vector (BEV) 


This bit is used when diagnostic tests cause exceptions to occur prior to verifying 
proper operation of the cache and main memory system. The Bootstrap Exception 
Vector (BEV) bit is automatically set to 1 at cold reset or soft reset and on 
occurrence of the NMI exception. This bit can also be set by software. 


When set, the Bootstrap Exception Vector (BEV) bit in the Status register causes 
the TLB miss exception vector to be relocated to a virtual address of OxFFFF 
FFFF BFCO 0200 and the general exception vector relocated to address OxFFFF 
FFFF BFCO 0380. 


When BEV is cleared, these vectors are located at 0x FFFF FFFF 8000 0000 (TLB 
refill) and OXFFFF FFFF 8000 0180 (general). 


9.3.7 Interrupt Enable (IE) 
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When the /E bit in the Status register is cleared, interrupts are not allowed, with 
the exception of reset and the non-maskable interrupt. 
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This chapter describes the clock signals (“clocks”) used in the Vp4300 processor. 
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10.1 Signal Terminology 
The following terminology is used in this chapter (and book) when describing 
signals: 
¢ Rising edge indicates a low-to-high transition. 
¢ Falling edge indicates a high-to-low transition. 


* Clock-to-Q delay is the amount of time that is taken for a signal to 
move from the input of a device (clock) to the output of the device 


(Q). 


Figures 10-1 and 10-2 illustrate these terms. 


Single Clock Cycle 
———_—_—_—_——————_ 


ee ee ee ee ee ee 
High-to-Low 
Transition Low-to-High 
Transition 
Figure 10-1 Signal Transitions 
Data Out 
Q 
Data In 
os 
Clock Input 
Clock-to-Q 
Delay 
<<» 


Figure 10-2 Clock-to-Q Delay 
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10.2 Basic System Clocks 


The various clock signals used in the Vp4300 processor are described below. 


MasterClock 


The internal and external (system interface) clocks of the Vp4300 are generated 
and operate based on the MasterClock. 


SyncIn/SyncOut 


PClock 


The Vp4300 processor generates SyncOut at the same frequency as MasterClock 
and aligns SyncIn with MasterClock. 


SyncOut must be connected to SyncIn either directly, or through an external 
buffer. The processor can compensate for both output driver and input buffer 
delays when aligning SyncIn with MasterClock. When SyncOut is connected 
to SyncIn through an external buffer as illustrated in Figure 10-7, delay caused by 
external buffers connected to clock outputs can also be compensated. 


The PClock is selected by setting the frequency ratio between the PClock and the 
MasterClock. 


This ratio is set by the DivMode pins on power application. Table 10-1 indicates 
the selectable frequency ratio. For details of the DivMode pins settings, refer to 
Table 2-2 Clock/Control Interface Signals. 


When the low power mode (100 MHz model of the Vp4300 and the Vp4305 only) 
is set by setting the RP bit of the Status register, the frequency of PClock 
decreases to the 1/4 of the normal level. 


All the internal registers and latches use PClock. 


Table 10-1 Frequency Ratio Between PClock and MasterClock 


Product Name DivMode Pin Selectable Frequency Ratio (MasterClock : PClock) 
VR4300 DivMode (1:0) | 1:1.5°',1:2,1:3,1:4° 

VR4305 DivMode (1:0) | 1:1,1:2,1:3 

VR4310 DivMode (2:0) | 1:2,1: Dero 1:3,1:4,1:5,1:6 


*1. Selectable with the 100 MHz model only (With the 133 MHz model, this setting is reserved.) 
2. Selectable with the 133 MHz model only (With the 100 MHz model, this setting is reserved.) 
3. Selectable with the 167 MHz model only (With the 133 MHz model, this setting is reserved.) 
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SClock 


The frequency of the system interface clock (SClock) is equal to that of 
MasterClock, and SClock is synchronized with MasterClock. Because SClock 
is generated from PClock, the frequency of SClock also drops to the 1/4 of the 
normal level, like the frequency of PClock, when the low power mode (100 MHz 
model of the Vp4300 and the Vp4305 only) is set. The output of the Vp4300 is 
driven at the edge of SClock. 


SClock rises in synchronization with the first rising edge of MasterClock 
immediately after ColdReset is deasserted inactive. 
TClock 


TClock (transfer/receive clock) is the reference clock of the output and input 
registers of the external agent. It is also used as the global clock of the external 
agent, and a clock can be supplied to all the logic circuits in the external agent. 


TClock is the same as SClock in frequency, and its edge is accurately 
synchronized with that of SClock. When SyncIn is connected to SyncOut, 
TClock can also be synchronized with MasterClock. 


260 User’s Manual U10504EJ7VOUMOO 


Cycle 


| 1 


2 


3 


Clock Interface 


4 | 


MasterClock a ae i ey See eee nc (ee 


(input) 


PClock 
(internal) 


SClock 
(internal) 


TClock 
(output) 


SysAD(31:0) 
(Driven by 
processor) 


SysAD(31:0) 
(Received by 
processor) 


'MckHigh 


taickLow 


tickP 


h oj 


\D 


e 


Figure 10-3 When Frequency Ratio of MasterClock to PClock is 1:1.5 
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Figure 10-4 When Frequency Ratio of MasterClock to PClock is 1:2 
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10.3 System Timing Parameters 


As shown in Figures 10-3 and 10-4, data provided to the processor must be stable 
a minimum of tpg nanoseconds (ns) before the rising edge of SClock and be held 
valid for a minimum of tp; ns after the rising edge of SClock. 


10.3.1 Synchronization with SClock 


Processor data becomes stable tpo ns after the rising edge of SClock. This drive- 
time is the sum of the maximum delay through the processor output drivers 
together with the maximum clock-to-Q delay of the processor output registers. 


10.3.2 Synchronization with MasterClock 


Certain processor inputs (specifically Reset) are sampled based on MasterClock. 
The same setup, hold, and off time, tpg, tpy, and tpo, shown in Figures 10-3 and 
10-4, apply to these inputs, measured by MasterClock. 


10.3.3 Phase-Locked Loop (PLL) 


The processor synchronizes SyncOut, PClock, SClock, and TClock with internal 
phase-locked loop (PLL) circuits that generate aligned clocks based on SyncOut/ 
Syncln. By their nature, PLL circuits are only capable of generating synchronized 
clocks with the MasterClock frequencies within a limited range. 


Clocks generated using PLL circuits contain some inherent inaccuracy, or jitter; a 
clock synchronized with MasterClock by the PLL can lead or trail MasterClock 
by as much as the related maximum jitter (tyycyitter)- 
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10.4 Low Power Mode Operation 
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Usually, PClock is generated based on MasterClock at the frequency ratio set by 
the DivMode(1:0) ! pins (for the setting, refer to Table 2-2 Clock/Control 
Interface Signals). The frequency of the system interface clock (SClock) is the 
same as that of MasterClock. 


To set the low power mode (RP)", set the RP bit of the Status register by using a 
transfer instruction. When the RP mode has been set, the processor stalls the 
pipeline which then enters the pause (quiescent) status (in other words, the store 
buffer becomes empty and all cache misses are solved). Next, the frequency of 
PClock drops to the 1/4 in the normal mode. The frequency of SClock also drops 
to the 1/4 of the normal level (10 MHz). 


The normal clocks can be restored by executing reset. 
For the procedure to set or clear the RP bit, refer to Low Power Mode in 9.3.1. 


*1. In Vp4300 and Vp4305. In Vp4310, DivMode(2:0). 
2. 100 MHz model of the Vp4300 and the Vp4305 only 
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10.5 Connecting Clocks to a Phase-Locked System 


When the processor is used in a phase-locked system, the external agent must 
phase lock its operation to a common MasterClock. In such a system, the 
transmission of data and data sampling have common characteristics, even if the 
components have different delay values. For example, transmission time (the 
amount of time a signal takes to move from one component to another along a 
trace on the board) between any two components A and B of a phase-locked 
system can be calculated from the following equation: 


Transmission Time = (SClock period) — (tpg for A) — (tpg for B) - 
(Clock Jitter for A Max) — (Clock Jitter for B Max) 


Figure 10-5 shows a block diagram of a phase-locked system using the Vp4300 
processor. 


MasterClock 
VR4300 External Agent 

MasterClock MasterClock 
SysCmd(4:0) SysCmd(4:0) 
SysAD(31:0) SysAD(31:0) 

SyncOut 

Syncln 

TClock 


Figure 10-5 Phase-Locked System 
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10.6 Connecting Clocks to a System without Phase Locking 


When the Vp4300 processor is used in a system in which the external agent cannot 
lock its phase to a common MasterClock, the output clock TClock can clock the 
remainder of the system. Two clocking methodologies are described in this 
section: connecting to a gate-array device or connecting to CMOS discrete 
devices. 


10.6.1 Connecting to a Gate-Array Device 


When the processor is connected to a gate array device, TClock is used as the 
transmit/receive clock in the gate array. 


Figure 10-6 is a block diagram of a system without phase lock, using the Vp4300 
processor with an external agent implemented as a gate array. 
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Figure 10-6 Gate-Array System without Phase Lock, Using the Vp4300 Processor 


User's Manual U10504EJ7VOUM00 267 


Chapter 10 


Signal Transmission Time from Processor to External Agent 


In a system without phase lock, the transmission time for a signal from the 
processor fo an external agent composed of gate arrays can be calculated from the 
following equation: 


Transmission Time = (1TClock period) — (tpo for Vp4300) 
+ (Minimum External Clock Buffer Delay) 
— (External Input Register Setup Time) 
— (Maximum Clock Jitter for Vp4300 Internal Clocks) 
— (Maximum Clock Jitter for TClock) 


Signal Transmission Time from External Agent Processor 


The transmission time for a signal from an external agent composed as gate arrays 
to the processor in a system without phase lock can be calculated from the 
following equation: 


Transmission Time = (1TClock period) — (tps for Vp4300) 
— (Maximum External Clock Buffer Delay) 
— (Maximum External Output Register Clock-to-Q Delay) 
— (Maximum Clock Jitter for TClock) 
— (Maximum Clock Jitter for Vp4300 Internal Clocks) 
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10.6.2 Connecting to a CMOS Discrete Device 


The processor uses a clock buffer that corrects the delay to supply a synchronous 
clock to an external CMOS discrete device. The clock buffer that corrects the 
delay is inserted into the SyncOut/SyncIn synchronization bus of the processor 
to adjust the skew of SyncOut and TClock by delaying PClock synchronized 
with MasterClock, and advances SyncOut and TClock from MasterClock by 
the buffer delay. 


When using TClock whose buffer delay has been corrected, the other delay 
correcting clock buffers can be used. 


The phase error of the buffered TClock can be obtained by adding up the 
maximum delay error of the delay correcting clock buffer and the maximum clock 
jitter of TClock. 


Functioning as the global clock of the CMOS discrete devices that form the 
external agent, the buffered TClock supplies a clock to the register that samples 
the processor output and the register that drives the processor input. 


The transmission time for a signal from the processor to an external agent 
composed of CMOS discrete devices can be calculated from the following 
equation: 


Transmission Time = (1TClock period) — (tpg for Vp4300) 
— (External Input Register Setup Time) 
— (Maximum External Clock Buffer Delay Mismatch) 
— (Maximum Clock Jitter for Vp4300 Internal Clocks) 
— (Maximum Clock Jitter for TClock) 


Figure 10-7 is a block diagram of a system without phase lock, employing the 
VR4300 processor and an external agent composed of both a gate array and 
CMOS discrete devices. 
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Figure 10-7 Gate-Array and CMOS System without Phase Lock, Using the Vp4300 Processor 
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The transmission time for a signal from an external agent composed of CMOS 
discrete devices can be calculated from the following equation: 


Transmission Time = (1TClock period) — (tps for Vp4300) 
— (Maximum External Output Register Clock-to-Q Delay) 
— (Maximum External Clock Buffer Delay Mismatch) 
— (Maximum Clock Jitter for Vp4300 Internal Clocks) 
— (Maximum Clock Jitter for TClock) 


In this clocking methodology, the hold time of data driven from the processor to 
an external input register is an important parameter. To guarantee hold time, the 
minimum output delay of the processor, tpo, must be greater than the sum of: 


Minimum Hold Time for the External Input Register 

+ Maximum Clock Jitter for Vp4300 Internal Clocks 

+ Maximum Clock Jitter for TClock 

+ Maximum Delay Mismatch of the External Clock Buffers 
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This chapter describes in detail the cache memory: its place in the Vp4300 
memory organization, and individual organization of the caches. 


This chapter uses the following terminology: 
e The data cache may also be referred to as the D-cache. 


e The instruction cache may also be referred to as the I-cache. 


These terms are used interchangeably throughout this book. 
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11.1 Memory Organization 


Figure 11-1 shows the Vp4300 system memory hierarchy. In the logical memory 
hierarchy, the caches lie between the CPU and main memory. They are designed 
to make the speedup of memory accesses transparent to the user. 


Each functional block in Figure 11-1 has the capacity to hold more data than the 
block above it. For instance, physical main memory has a larger capacity than the 
caches. At the same time, each functional block takes longer to access than any 
block above it. For instance, it takes longer to access data in main memory than 
in the CPU on-chip registers. 
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Figure 11-1 Logical Hierarchy of Memory 


The Vp4300 processor has two on-chip caches: one holds instructions (the 
instruction cache), the other holds data (the data cache). The instruction and data 
caches can be read in one PClock cycle. 


Data writes take two PClock cycles. In the first cycle, the store address is 
generated and the tag is checked; in the second cycle, the data is written into the 
data RAM. 
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11.2 Cache Organization 


This section describes the organization of the on-chip data and instruction caches. 
Figure 11-2 provides a block diagram of the Vp4300 cache and memory model. 


Vn4300 


Cache Controller ea Main Memory 


l-cache 
Caches 


Figure 11-2 Vp4300 Cache Support 


Cache Line Lengths 


A cache line is the smallest unit of information that can be fetched from main 
memory for the cache, and that is represented by a single tag. 


The line size for the instruction cache is 8 words (32 bytes) and the line size for 
the data cache is 4 words (16 bytes). 


For cache tags, refer to 11.2.1 Organization of the Instruction Cache (I-Cache) 
and 11.2.2 Organization of the Data Cache (D-Cache). 


Cache Sizes 


The Vp4300 instruction cache is 16 KB; the data cache is 8 KB. 
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11.2.1 Organization of the Instruction Cache (I-Cache) 


Each line of I-cache data (although it is actually an instruction, it is referred to as 
data to distinguish it from its tag) has an associated 21-bit tag that contains a 20- 
bit physical address and Valid bit. 


The Vp4300 processor I-cache has the following characteristics: 
e direct-mapping method 
¢ indexed with a virtual address 
e checked with a physical tag 
* organized with an 8-word (32-byte) cache line. 


Figure 11-3 shows the format of an 8-word (32-byte) I-cache line. 


20 19 0 
V PTag 
1 20 
255 0 
Data | 


PTag_ : Physical tag (bits 31:12 of the physical address) 
V : Valid bit 
Data : Cache data 


Figure 11-3 Vp4300 8-Word I-Cache Line Format 
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11.2.2 Organization of the Data Cache (D-Cache) 


Each line of D-cache data has an associated 22-bit tag that contains a 20-bit 
physical address, a Valid bit, and a Dirty bit. 


The Vp4300 processor D-cache has the following characteristics: 


write-back 

direct-mapping method 
indexed with a virtual address 
checked with a physical tag 


organized with a 4-word (16-byte) cache line. 


Figure 11-4 shows the format of a 4-word (16-byte) D-cache line. 


21 20 19 0 
V|OD PTag 
1 1 20 
0 


: Valid bit 
: Dirty bit (refer to 11.4 Cache States) 

: Physical tag (bits 31:12 of the physical address) 
: D-cache data 


Figure 11-4 Vp4300 4-Word Data Cache Line Format 
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11.2.3 Accessing the Caches 


Figure 11-5 shows the virtual address (VA) index into the caches. The number of 
virtual address bits used to index the instruction and data caches depends on the 
cache size. 


Data Cache Addressing 


VA(12:4) is used. Since the cache size is 8 KB, the most significant bit is VA12. 
Furthermore, since the line size is 4 words (16 bytes), the least-significant bit is 
VA4. 


Instruction Cache Addressing 


VA(13:5) is used. Since the cache size is 16 KB, the most-significant bitis VA 13. 
Furthermore, since the line size is 8 words (32 bytes), the least-significant bit is 
VAS. 


ee 


Tags Data 


Tag Line Data Line 


>| 

VA(12:4) for 8 KB D-cache 
and 

VA(13:5) for 16 KB I-cache 


V Tag D Data 


Figure 11-5 Cache Data and Tag Organization 
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11.3 Cache Operations 


As described earlier, caches provide temporary data storage, and they make the 
speedup of memory accesses transparent to the user. In general, the processor 
accesses cache-resident instructions or data through the following procedure: 


1. The processor, through the on-chip cache controller, attempts to access the 
next instruction or data in the appropriate cache. 


2. The cache controller checks to see if this requested instruction or data is 
present in the cache. 


¢ — If the instruction/data is present, the processor retrieves it. This is 
called a cache hit. 


¢ — If the instruction/data is not present in the cache, the cache controller 
must retrieve it from main memory. This is called a cache miss. 


3. The processor retrieves the instruction/data from the cache and operation 
continues. 


It is possible for the same data to be in two places simultaneously: main memory 
and cache. This data is kept consistent through the use of a write-back 
methodology; that is, modified data is not written back to main memory until the 
cache line is to be replaced. 


Instruction and data cache line replacement operations are described in the 
following sections. 
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11.3.1 Cache Write Policy 


The Vp4300 processor manages its data cache by using a write-back policy; that 
is, it stores write data into the cache, instead of writing it directly to the main 
memory. Some time later this data is independently transferred into the main 
memory. In the Vp4300 implementation, a modified cache line is not written back 
to the main memory until the cache line is to be replaced either in the course of 
satisfying a cache miss, or during the execution of a write-back CACHE 
instruction. 


When the cache-miss occurs and the processor writes the contents of a cache line 
back to the main memory, it does not ordinarily retain a copy of the cache line, 
and the state of the cache line is changed to Clean. 


11.3.2 Data Cache Line Replacement 


Since the data cache uses a write-back methodology, a cache line load is issued to 
main memory on a load or store miss, as described below. After the data from the 
main memory is written to the data cache, the pipeline resumes execution. 


The line replacement sequence is based on a “Critical Doubleword First” scheme 
refer to subblock ordering in 12.2.1 Physical Addresses. The processor restarts 
its pipeline as soon as the main memory supplies the desired word in the first 
doubleword of a block transfer. This sequence is summarized as follows: 


1. Move the data physical address to the SysAD(31:0). At the same time, move 
the dirty cache line to the write buffer. 


2. At the timing of SClock rising edge, read the data from the main memory, 
receiving the desired doubleword in two word data first. 


3. Receive remaining doubleword in word data units. For all loads move the 
data to target register. For byte, halfword and word stores, it is necessary to 
do a read in the main memory followed by a write procedure—tead the 64-bit 
data, write new data to this read data, then write the 64-bit data to cache. As 
this is being done, interlock the data cache to prevent it from being accessed 
by any subsequent instruction that tries to access this particular cache line. 


Rules for replacement on data load and data store misses are given below. 


* An alternative to this is a write-through cache, in which information is written simultaneously to 
cache and memory. 
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Data Load Miss 


Cache Memory 


If the missed cache line is not dirty, it is replaced with a new line. 


If the missed line is dirty, it is moved to the write buffer. A new line replaces the 
missed line, and the data in the write buffer is written to the main memory. 


Data Store Miss 


If the missed cache line is not dirty, it is replaced with the new cache line merged 


with the store data. 


If the missed cache line is dirty, it is moved to the write buffer. A new cache line 
is merged with the store data and written to cache, and data in the write buffer is 
written to the memory. The data is written sequentially, starting from the first 

address of the block (refer to sequential ordering in 12.2.1 Physical Addresses). 


The data cache miss stall in number of PClock cycles is: 


Table 11-1 Stall Cycle Count for Data Cache Miss 


ey Operation 
of Cycles P 
1 DC stage stall 
I Transfer address to write buffer and wait for the pipeline start 
signal 
lto2 Synchronize with SClock and transfer address to internal SysAD 
bus 
2 Transfer to external SysAD bus 
Time needed to access memory, measured in PClock cycles 
2 Transfer the cache line from memory to the SysAD bus 
I Transfer the cache line from the external to internal bus and to 
D-cache bus 
0 Restart the DC stage 
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11.3.3 Instruction Cache Line Replacement 


For an instruction cache miss, refill is done using sequential ordering, reading 
from the first word of the requested cache line. 


During an instruction cache miss, a memory read request is issued by the 
processor. That is the requested cache line is read from the main memory and 
written to the instruction cache. At this time the pipeline resumes execution, and 
the instruction cache is reaccessed. 

The replacement sequence for an instruction cache miss is: 

1. Move the instruction physical address to the SysAD(31:0). 


2. Read the instruction data at the timing of SClock rising edge from the main 
memory and write it out to the instruction cache. 


3. Restart the pipeline operation. 


The instruction cache miss stall in number of PClock cycles is: 


Table 11-2 Stall Cycle Count for Instruction Cache Miss 


Siancod Operation 
of Cycles P 
1 RF stage stall 
1 Transfer address to write buffer and wait for the pipeline start 
signal 
Lto2 Synchronize with SClock and transfer address to internal SysAD 
bus 
2 Transfer to external SysAD bus 
M Time needed to access memory, measured in PClock cycles 
8 Transfer the cache line from memory to the SysAD bus 
I Transfer the cache line from the external to internal bus and to 
I-cache bus 
0 Restart the RF stage 
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11.4 Cache States 


Cache Line 


The four terms below are used to describe the state of a cache line: 
¢ Walid: a cache line that contains valid information. 


¢ Dirty: a cache line containing data that has changed in valid status 
since it was loaded from memory. 


e Clean: a cache line containing data that has not changed in valid 
status since it was loaded from the main memory. 


¢ Invalid: a cache line that does not contain valid information must be 
marked invalid, and cannot be used. For example, after a Soft Reset, 
software sets all cache lines to invalid. A cache line in any other state 
than invalid is assumed to contain valid information. 
Neither a cold reset nor a soft reset makes the state of a cache invalid. 
Software invalidates it. 


Data Cache 


The data cache supports three cache states: 
° invalid 
° clean 


° dirty 


Instruction Cache 
The instruction cache supports two cache states: 
° invalid 
° valid 


The cache line that contains valid information may be changed when the processor 
executes the CACHE operation. For CACHE operation, refer to Chapter 16 
CPU Instruction Set Details. 


11.5 Cache State Transition Diagrams 


The following section describes the cache state diagrams for the data and 
instruction caches. These state diagrams do not cover the initial state of the 
system, since the initial state is system-dependent. 
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11.5.1 Data Cache State Transition 


The following diagram illustrates the data cache state transition sequence. A load 
or store operation may include one or more of the atomic read and/or write 
operations shown in the state diagram below, which may cause cache state 
transitions. 


e¢ Read(1) indicates a read operation from memory to cache, inducing a 
cache state transition. 


e Write(1) indicates a write operation from the processor to cache, 
inducing a cache state transition 


e Read(2) indicates a read operation from cache to the processor, which 
induces no cache state transition 


e Write(2) indicates a write operation from the processor to cache, 
which induces no cache state transition 


CACHE instruction 


CACHE instruction 
a 


Write(1) Read(1) 


Read(2) 
Write(2) 


Read(2) 
Write(1) 
CACHE instruction 


Write Back 


Figure 11-6 Data Cache State Diagram 
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11.5.2 Instruction Cache State Transition 


The following diagram illustrates the instruction cache state transition sequence. 


e Read(1) indicates a read operation from the main memory to cache, 
inducing a cache state transition. 


e¢ Read(2) indicates a read operation from cache to the processor, which 
induces no cache state transition. 


CACHE instruction = 
Read(2) Read(1) 


Figure 11-7 Instruction Cache State Diagram 


11.6 Manipulation of the Caches by an External Agent 


The Vp4300 does not provide any mechanisms for an external agent to examine 
and manipulate the state and contents of the caches. 
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12 


The System interface allows the processor to access external resources needed to 
perform processing of cache misses and uncached areas, while permitting an 
external agent to access to some of the processor internal resources. 


This chapter describes the System interface between the processor and the 
external agent. 


The Vp4300 uses a subset of the System interface contained on the Vp4400 and 
VR4200. 
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12.1 Terminology 


The following terms are used in this chapter: 


e An external agent is any device connected to the processor, over the 
System interface, that processes requests issued by the processor. 


¢ A system event is an event that occurs within the processor and 
requires access to external resources. System events include: an 
instruction fetch that misses in the instruction cache; a load/store 
instruction that misses in the data cache; an uncached load or store 
instructions; an execution of cache instructions. 


* Sequence refers to the series of requests that a processor generates to 
process a system event. 


¢ Protocol refers to the cycle-by-cycle signal transitions that occur on 
the System interface pins, which issue external request, or a 
processor. 


¢ Syntax refers to the definition of bit patterns on encoded buses, such 
as the command bus. 


¢ Block indicates any data transfer of 8 bytes or longer across the 
System interface. 


¢ Single indicates any data transfer of 7 bytes or shorter across the 
System interface. 


e Fetch refers to the read of information from the instruction cache. 


¢ Load refers to the read of information from the data cache. 
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12.2 System Interface Description 


The processor uses the System interface to access external resources required for 
performing cache misses and uncached area processing. 


12.2.1 Physical Addresses 


Physical addresses are output to SysAD(31:0) in the address cycle. The address 
when the single read request and single write request are issued is determined by 
the data length as follows. 


e If the data is a word (4 bytes), the low-order 2 bits of the address are 
0. 


e If the data is a halfword (2 bytes), the low-order 1 bit of the address 
is 0. 

e If the data is 1, 3, 5, 6, or 7 bytes, the supplied address is a byte 
address (the 5-, 6-, or 7-byte data is divided into two single write 
requests). 


When a doubleword (2 words), 4 words, or 8 words are transferred, a block 
request is issued. The block read request and block write request differ as follows 
in the physical address to be output. 


Block Write Request 


The physical address when the block write request is issued is always aligned with 
the first word address of the block (sequential ordering). 


Block Read Request 


e Instruction cache read request 


The block read request when a miss occurs in the instruction cache, 
the physical address is aligned with the 8-word data address (the low- 
order 5 bits are 0) including the requested word and output. Figure 
12-1 shows the sequence in which data are transferred from the main 
memory when a block read request is issued to the instruction cache. 
When an instruction cache read request is issued, data is always read 
starting from WO (sequential ordering). 
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Transfer sequence 


1 2 3 4 5 6 7 8 (Sequential ordering) 


WO | W1 | W2 | W3 | W4 | W5 | W6 wr] 


Output physical address Requested word 


Figure 12-1 Data Sequence on Instruction Cache Read Request 


e Data cache read request 


If a block read request is issued when a miss occurs in the data cache, 
the physical address is aligned with the doubleword address (the low- 
order 3 bits are 0) including the requested data and output. Figure 
12-2 shows the data sequence in which data is transferred from the 
main memory when a block read request is issued to the data cache. 
When a data cache read request is issued, reading a doubleword 
including the necessary data is started in word units (W2 in this case) 
(refer to Sub block ordering in 12.12.2 Sequential and Subblock 
Ordering). 


Transfer sequence 


3 4 1 2 (Subblock ordering) 
WO | W1 | W2 | W3 


Output physical address Requested word 


Figure 12-2. Data Sequence on Data Cache Read Request 
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12.2.2 Interface Buses 


Figure 12-3 shows the primary communication buses for the System interface: a 
32-bit address/data bus, SysAD(31:0), and a 5-bit command bus, SysCmd(4:0). 
These SysAD and the SysCmd buses are bidirectional; that is, they are driven by 
the processor to issue a processor request, and by the external device to issue an 
external request (refer to 12.4 Processor and External Requests). 
A request through the System interface consists of: 

* an address 


¢ a System interface command that specifies the nature of the request 


* response data to read request, and write data to write request 


VR4300 


External Agent 


SysAD(31:0) 
~t 


= SysCmd(4:0) 


Figure 12-3 System Interface Buses 
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12.2.3 Address and Data Cycles 


292 


The SysCmd (4:0) bus identifies the contents of the SysAD(31:0) bus during any 
cycle in which it is valid. Cycles in which the SysAD(31:0) bus contains a valid 
address are called address cycles. Cycles in which the SysAD(31:0) bus contains 
valid data are called data cycles. The most significant bit of the SysCmd(4:0) bus 
is always used to indicate whether the current cycle is an address cycle or a data 
cycle. Validity is determined by the state of the EValid and PValid signals 
(described in 12.2.2 Interface Buses). 


When the Vp4300 processor is driving the SysAD(31:0) and SysCmd(4:0) buses, 
the System interface is in master state. When the external agent is driving them, 
the System interface is in slave state. 


* When the processor is master, it asserts the PValid signal when the 
SysAD(31:0) and SysCmd(4:0) buses are valid. 


e When the processor is slave, an external agent asserts the EValid 
signal when the SysAD(31:0) and SysCmd(4:0) buses are valid. 


SysCmd(4:0) indicate the following contents if the PValid or EValid signal is 
active. 


¢ During address cycles [SysCmd4 = 0], the remainder of the 
SysCmd(4:0) bus, SysCmd(3:0), contains a System interface 
command (the encoding of System interface commands is detailed in 
12.11 System Interface Commands and Data Identifiers). 


¢ During data cycles [SysCmd4 = 1], the remainder of the 
SysCmd(4:0) bus, SysCmd(3:0), contains a data identifier command 
(the encoding of data identifiers is detailed in 12.11 System Interface 
Commands and Data Identifiers). 
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12.2.4 Issue Cycles 


Processor Request 


There are two types of processor issue cycles: 
* processor read request 


* processor write request 
The issuance cycle of the processor read/write request is determined by the status 
of the EOK signal. The issuance cycle is a cycle that becomes valid in the address 


cycle of each processor request. Only one issuance cycle exists for one processor 
request. 


To define the issuance cycle of the address cycle, assert the EOK signal active at 
the external agent side one cycle before the address cycle of the processor read/ 
write request as shown in Figure 12-4. 


To define the address cycle as the issuance cycle, do not deassert the EOK signal 
inactive until the address cycle is started. 


scyle | 1 | 2 | 3 | 4 | 5 | 6 | 


SClock 
intemal) = VS VS VS VS VSS 
oman ca 


EOK 
(input) 


Issuance cycle 


Figure 12-4 EOK Signal Status of Processor Request 


The processor repeatedly outputs the address cycle until the address cycle of the 
processor request becomes the issuance cycle. With the Vp4300, therefore, the 
address cycle next to the cycle in which the EOK signal has become active is the 
issuance cycle, and the address cycle is repeated up to that cycle. Figure 12-5 
illustrates how the address cycle is extended by the EOK signal. 
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style | 1 | 2 | 3 | 4 | 5 |] 6 | 7 | 
io es a es 
east (Addr +d) 
EOK 


(input) — 
Issuance cycle 


Figure 12-5 Address Cycle Extended by EOK Signal 


Processor and External Requests 


The processor accepts external requests, even while attempting to issue a 
processor request, by releasing the System interface to slave state in response to 
EReq signal by the external agent. 


When an issuance of processor request and external request compete with each 
other, the processor either: 


¢ completes the issuance of the processor request before the external 
request is accepted, or 


¢ releases the System interface to slave state without completing the 
issuance of the processor request. 


In the latter case, the processor issues the processor request (provided the 
processor request is still necessary) after the external request is completed. 
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12.2.5 Handshake Signals 


The processor manages the flow of requests through the following six control 
signals: 


EOK Signal 


This signal is used by the external agent to indicate whether it can accept a new 
read or write transactions. 


EReq, PMaster and PReq Signals 


These signals are used to transfer control of the SysAD(31:0) and SysCmd(4:0) 
buses. EReq signal is used by an external agent to indicate a need to control the 
interface. PMaster signal is deasserted by the processor when it transfers control 
of the System interface to the external agent. The PReq signal is used by the 


processor to request the external agent, which holds the right to control the system 
interface, for the right of control. 


PValid and EValid Signals 


The Vp4300 processor uses PValid signal, and the external agent uses EValid 
signal to indicate valid command/data on the SysCmd(4:0)/SysAD(31:0) buses. 
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12.3 System Interface Protocols 


Figure 12-6 shows the register-to-register operation of the System interface. That 
is, output signals of the processor come directly from output registers and begin 
to change in synchronization with the rising edge of SClock. 


Input signals to the processor are fed directly to input registers that latch these 
input signals with the rising edge of SClock. 


VR4300 
Output data 
—_ 
|< > 
Input data 
~t 
SClock 


Figure 12-6 System Interface Register-to-Register Operation 


12.3.1 Master and Slave States 


When the Vp4300 processor is driving the SysAD(31:0) and SysCmd (4:0) buses, 
the System interface is in master state. When the external agent is driving these 
buses, the System interface is in slave state. 


In master state, the processor asserts the PValid signal whenever the 
SysAD(31:0) and SysCmd(4:0) buses are valid. 


In slave state, the external agent asserts the EValid signal whenever the 
SysAD(31:0) and SysCmd(4:0) buses are valid. 
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12.3.2 Moving from Master to Slave State 


The processor is the default master of the system interface. An external agent 
becomes master of the system interface through external arbitration, or after a 
processor read request. The external agent returns mastership to the processor 
after an external request completes. 


The System interface remains in master state unless one of the following occurs: 


¢ The external agent requests and is granted the System interface 
control (external arbitration). 


e The processor issues a read request (uncompelled change to slave 
state). 


The following sections describe these two cases. 


12.3.3 External Arbitration 


The System interface must be in slave state for the external agent to issue an 
external request through the System interface. The transition from master state to 
slave state is arbitrated by the processor using the System interface handshake 
signals EReq and PMaster. This transition is described by the following 
procedure: 


1. Anexternal agent transmits a request to issue an external request to the 
processor by asserting EReq signal. 


2. When the processor is ready to accept an external request, it releases the 
System interface from master to slave state by deasserting PMaster signal. 


3. The System interface returns to master state as soon as the issue of the 
external request is completed. 


This process is described in 12.6.6 External Arbitration Protocol. 
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12.3.4 Uncompelled Change to Slave State 


An uncompelled change to slave state is the transition of the System interface from 
master state to slave state, performed by the processor itself when a processor read 
request is pending. PMaster signal is deasserted automatically after a read 
request. An uncompelled change to slave state occurs either the first cycle after 
the issue cycle of a processor read request. 


When the processor returns from the uncompelled transition differs depending on 
the cache status. The processor returns to the master status when the following 
external request (read response or other external request) is completed after the 
uncompelled transition to the slave status. 


An external agent must confirm that the processor has performed an uncompelled 
change to slave state, and begin driving the SysAD(31:0) bus along with the 
SysCmd(4:0) bus. As long as the System interface is in slave state, the external 
agent can begin an external request without arbitrating for the System interface; 
that is, without asserting EReq signal. 


If EReq is inactive, at the time the external request is completed, the System 
interface automatically returns to master state. 


12.4 Processor and External Requests 
There are two categories of requests: processor requests and external requests. 


When a system event occurs, the processor issues a request through the system 
interface to access some external resource necessary to service this event. For this 
to occur, the system interface must be connected to an external agent that 
coordinates the access to system resources. An external agent requesting access 
to an internal resource of the processor issues an external request. 


Processor requests include the following: 
e read requests, which provide a read address to an external agent 


* — write requests, which provide an address and a single or block of data 
to be written to an external agent. 


External requests include the following: 


¢ read responses, which provide a block or single transfer of data from 
an external agent in response to read requests 


* — write requests, which provide an address and a word of data to be 
written to a processor resource 
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When an external agent receives a read request, it accesses the specified resource 
and returns the response data as a read response, which may be returned at any 
time after the read request is completed. 


A processor read request is completed after the last response data has been 
received from the external agent. A processor write request is completed after the 
last word of data has been transferred. 


The processor will not issue another request while a read request is pending 
(before receiving the response data after issuing the read request). 


System events and requests are shown in Figure 12-7. 


Vp4300 


Processor Requests 
« Read 
« Write 


External Agent 


External Requests 
« Read response 
« Write 


System Events 
* Fetch Miss 
« Load Miss 
* Store Miss 


* Load/Store to Uncached area 
* CACHE instructions 


Figure 12-7 Requests and System Events 
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12.4.1 Processor Requests 


A processor request is a request through the System interface, to access some 
external resource. Processor requests are either read or write requests. 


Outline Requests 


Read request asks for a block, word, or partial word of data either from main 
memory or from another system resource. 


Write request provides a block, word, or partial word of data to be written either 
to main memory or to another system resource. 


Request Issuance 


The processor issues requests in a strict sequential order; that is, the processor is 
only allowed to have one request pending at any time. For example, the processor 
issues a read request and waits for a read response before issuing any subsequent 
requests. The processor issues a write request only if there are no read requests 
pending. 


Request Control 


The processor has the input signal EOK to allow an external agent to control the 
flow of processor requests. 


The processor request cycle sequence is shown in Figure 12-8. 


VR4300 External Agent 


1. Processor issues read or write 
request is 


| 2. External system controls 
acceptance of requests by 
asserting EOK signal 


Figure 12-8 Processor Request Flow 
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12.4.2 Processor Read Request 


When a processor issues a read request, the external agent must access the 
specified resource and return the requested data. 


A processor read request can be split by the external agent’s response data; in 
other words, the external agent can initiate an unrelated external request before it 
returns the response data for a processor read. A processor read request is 
completed after the last word of response data has been received from the external 
agent. 


Processor read requests that have been issued, but which data has not yet been 
returned, are said to be pending. A read request remains pending until the 
requested read data is returned. 


Note that the data identifier associated with the response data can indicate that the 
response data is erroneous, causing the processor to generate a bus error 
exception. 


The external agent must be capable of accepting a new processor read request at 
any time when the following two conditions are met: 


e« No present processor read request pending. 


* The EOK signal has been asserted for two or more cycles. 


12.4.3 Processor Write Request 


When a processor issues a write request, the specified external resource is 
accessed and the data is written to it. 


A processor write request is completed after the last word of data has been 
transferred to the external agent. 


The external agent must be capable of accepting a new processor write request at 
any time the following two conditions are met: 


e No present processor read request is pending. 


¢ The EOK signal has been asserted for two or more cycles. 
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12.4.4 External Requests 


External requests include read response and write requests. 


Outline of Requests 
Read response returns data in response to a processor read request. 


Write request provides data to be written to the processor’s internal resource. 


Request Control 


The processor controls the flow of external requests through the arbitration signals 
EReq and PMaster, as shown in Figure 12-9. The external agent must acquire 
mastership of the System interface before it issues an external request; the external 
agent acquires mastership of the System interface by asserting EReq signal and 
then waiting for the processor to deassert PMaster signal for one cycle. 


VR4300 External Agent 


| 1. External system requests master- 
ship by asserting EReq signal 


2. Processor grants mastership by 


deasserting PMaster signal et 


3. External system issues an 
external request 


4. Processor regains mastership 
when EReq signal becomes 
inactive 


Figure 12-9 External Request Flow 


Mastership of the System interface always returns to the processor when EReq 
signal becomes inactive after an external request is issued. The processor does not 
accept a subsequent external request until it has completed the current request. 


Request Issuance 


If there are no processor requests pending, the processor decides, based on its 
internal state, whether to accept the external request, or to issue a new processor 
request. The processor can issue a new processor request even if the external 
agent is requesting access to the System interface. 


The external agent asserts EReq signal indicating that it wishes to begin an 
external request. The processor releases mastership of the System interface by 
deasserting PMaster signal. An external request can be accepted based on the 
criteria listed below. 
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e The processor completes any processor request in execution. 


* While waiting for the assertion of EOK signal to issue a processor 
read/write request, EReq signal is input to the processor one or more 
cycles before EOK signal is asserted. 


e If waiting for the response to a read request after the processor has 
made an uncompelled change to a slave state (the external agent can 
issue an external request before providing the read response data). 


12.4.5 External Write Request 


When an external agent issues a write request, the specified external resource is 
accessed and the data is written to it. An external write request is completed after 
the word data has been transferred to the processor. 


The only processor resource available to an external write request is the Interrupt 
register. 


12.4.6 Read Response 


A read response returns data in response to a processor read request. While a read 
response is an external request, it has one characteristic that differentiates it from 
all other external requests—it does not perform System interface arbitration 
(requesting mastership of the System interface using EReq signal. 


VR4300 External Agent 


1. Read request 


2. Read response 


Figure 12-10 Read Response 
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12.5 Handling Requests 


This section details the sequence, protocol, and syntax (Refer to 12.1 
Terminology for definitions of these terms) of both processor and external 
requests. The following system events are discussed here: 


e fetch miss 
¢ load miss 
¢ store miss 
¢ loads/stores to uncached area 


¢ CACHE instructions 


12.5.1 Fetch Miss 


When the processor misses in the instruction cache on an instruction fetch, it 
issues a read request for the cache line acquisition. An external agent returns data 
as a read response. 


12.5.2 Load Miss 


When the processor misses in the data cache on a load, it issues a read request for 
the cache line acquisition. An external agent returns data as a read response. 


If the cache data to be replaced is in the dirty state, this data is written to the 
memory. The above read operation must be completed before the data in the dirty 
state is written. 


12.5.3 Store Miss 


If the processor store misses in the data cache, it issues a read request to retrieve 
the target cache line. After the target line has been retrieved by the external agent, 
it is updated with the store data and written into the cache. 


If the cache data to be replaced is in the dirty state, this data is written to the 
memory. The above read operation must be completed before the data in the dirty 
state is written. 


When it is desirable to guarantee that cached data written by a store instruction is 
consistent with main memory contents, the corresponding cache line must be 
written back from the cache to the main memory using a CACHE instruction. 
CACHE instructions are described in Chapter 16 CPU Instruction Set Details. 
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12.5.4 Loads or Stores to Uncached Area 


When the processor performs a load to uncached area, it issues aread request. An 
external agent returns a single/block transfer as a read response data. 


When the processor performs a store to uncached area, it issues a write request and 
provides a single/block transfer of data to the external agent. 


12.5.5 CACHE Instructions 


The processor provides a variety of CACHE operations to maintain the state and 
contents of the caches. The processor can issue write requests unrelated with the 
CACHE instruction during the execution of the CACHE instructions. 
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12.6 Processor Request and External Request Protocols 


The following sections contain a cycle-by-cycle description of the bus arbitration 
protocols for each type of processor and external request. Table 12-1 lists the 
definitions and abbreviations for each of the buses that are used in the timing 
diagrams that follow. 


Table 12-1 System Interface Requests 


Scope Abbreviation Meaning 
Global Unsd Unused 
Addr Physical address 
SysAD(31:0) bus 
Data<n> Data element number n of a block of data 
Cmd An unspecified System interface command 
Read A processor or external read request command 
SysCmd(4:0) bus Write A processor or external write request command 
EOD A data identifier for the last data element 
paid A data identifier for any data element other than the last data 


element 


12.6.1 Processor Request Protocols 


Processor request protocols described in this section include: 
* read 


* write 


12.6.2 Processor Read Request Protocol 


A processor read request is issued by outputting a read command on the 
SysCmd(4:0) bus and a read address on the SysAD(31:0) bus, and asserting 
PValid. Only one processor read request may be pending at a time; the processor 
must wait for an external read response before starting a subsequent read request. 


The processor makes an uncompelled change to slave state after the cycle of the 
read request by deasserting the PMaster signal. An external agent then returns 
the requested data through a read response. 
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Once the processor enters slave state (starting at cycle 5 in Figure 12-11), the 
external agent can return the requested data through a read response. The read 
response returns the requested data or, if the requested data could not be 
successfully retrieved, indicate to SysCmd(4:0) bus that the returned data is 
erroneous as a read response. If the returned data is erroneous, the processor 
generates a bus error exception. 


Figure 12-11 illustrates a processor read request, coupled with an uncompelled 
change to slave state, that occurs as the read request is issued. Figure 12-12 shows 
the processor read request delayed by the EOK signal. 


The following sequence describes the protocol for a processor read request (the 
numbered steps below correspond to Figures 12-11 and 12-12). 


1. The processor is in the master status. It outputs a read command to 
SysCmd(4:0) and a read address to SysAD(31:0) to issue a read request. 
After the read request is issued, the processor enters the pending status. Only 
one read request can be pending at a time. 


2. The processor asserts the PValid signal to indicate that the current data of 
SysCmd(4:0) and SysAD(31:0) are valid. 


3. The external agent asserts the EOK signal for two consecutive cycles to 
enable issuance of a processor read request. If the EOK signal is deasserted, 
the issuance cycle of the read request is delayed. 


4. The processor deasserts the PMaster signal at the first cycle after the read 
request is accepted, and shifts to the slave status unforcibly. 


5. The processor releases SysCmd(4:0) and SysAD(31:0) at the same time as 
the PMaster signal is deasserted. 


6. Anexternal agent can drive SysCmd(4:0) and SysAD(31:0) from the first 
cycle after the PMaster signal is deasserted. 
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Figure 12-11 Unforcible Transition by Processor Read Request 
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Figure 12-12 Delayed Processor Read Request 
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12.6.3 Processor Write Request Protocol 


A processor write request is issued by outputting a write command on the 
SysCmd(4:0) bus and a write address on the SysAD(31:0) bus, and asserting 
PValid signal. 


After that, a data identifier is output to SysCmd(4:0), write data is output to 
SysAD(31:0), and the PValid signal is asserted active to transfer during the cycles 
necessary for transferring the data. The transfer rate at this time is set by the EP 
bit of the Config register. 


The data cycle differs depending on the size of the write request. 
¢ 1 to4 bytes: Single data cycle 


¢ 5 to 7 bytes: Divided into two single write requests (one is 4 bytes 
long, and the other is | to 3 bytes long) 


¢ 8 bytes or more: Block data cycle in 4-byte units 
The last data is appended with a data identifier EOD (End of Data). 


Figure 12-13 shows the processor block write request by write data pattern D, and 
Figure 12-14 shows the processor block write request by write data pattern Dxx. 


The following sequence describes the protocol of the processor write request (the 
numbers correspond to the numbers in Figures 12-13 and 12-14). 


1. The processor is in the master status. It outputs a write command to 
SysCmd(4:0) and a write address to SysAD(31:0) to issue a write request. 


2. The processor asserts the PValid signal to indicate that the current data of 
SysCmd(4:0) and SysAD(31:0) are valid. 


3. The external agent asserts the EOK signal for two consecutive cycles to 
enable issuance of a processor write request. If the EOK signal is deasserted, 
the issuance cycle of the write request is delayed. 


4. The processor outputs a data identifier to SysCmd(4:0) and write data to 
SysAD(31:0). 


5. The processor asserts the PValid signal for the cycles necessary for data 
transfer, and transfer the data. 


6. The last data is appended with data identifier EOD. 
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Figure 12-13 Processor Block Write Request (Write Data Pattern: D) 
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Figure 12-14 Processor Block Write Request (Write Data Pattern: Dxx) 
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12.6.4 Flow Control of Processor Request 


The external agent uses the EOK signal to control the flow of the processor read 
request. The processor repeats the current address cycle until the EOK signal is 
asserted active. This address cycle continues for | cycle after the EOK signal has 
been asserted, and then the issuance cycle ends. The EOK signal must be asserted 
for at least two consecutive cycles. 


Figures 12-15 and 12-16 show how to use the EOK signal (the numbers in the 
description below correspond to the numbers in Figures 12-15 and 12-16. 


1. Because the EOK signal | cycle before is inactive, the processor request is 
delayed, and the address cycle does not end. 


2. Because the EOK signal | cycle before is active, the processor request is not 
delayed, and the address cycle ends. 
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Figure 12-15 Delayed Processor Read Request 
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Figure 12-16 Delayed Second Processor Write Request 


12.6.5 External Request Protocols 
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External requests can only be issued with the System interface in slave state. 
EReq signal must be asserted EReq signal to arbitrate (refer to 12.6.6 External 
Arbitration Protocol) for the System interface, and then wait for the processor to 
release the System interface to slave state. If the System interface is already in 
slave state—that is, the processor has previously performed an uncompelled 
change to slave state—the external agent can begin an external request 
immediately. 


After issuing an external request, the external agent must return mastership of the 
System interface to the processor, as described below. 


Following the description of the arbitration protocol, this section also describes 
the following external request protocols: 


* write 


e read response 
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12.6.6 External Arbitration Protocol 


Usually, the processor serves as the bus mastership. However, the processor 
relinquishes control of the bus and enters the slave status in the following cases. 


e If the external agent issues a request and the system interface 
responds to that request 


e After the processor has issued a read request 
Arbitration to allow the processor to enter the slave status from the master status 


is realized by using the handshake signals (EReq, PReq, and PMaster) of the 
system interface. 


Status Transition On Read Response 


While the processor read request is kept pending, the processor enters the slave 
status by deasserting the PMaster signal inactive, and the external agent returns 
read response data. 


If the EReq signal is deasserted inactive, the processor remains in the slave status 
until the read response data is returned, and then returns to the master status by 
asserting the PMaster signal active. 


The external agent can remain in the master status as long as the EReq signal 
remains active when the read response is returned. 


Acquiring Bus Mastership by EReq Signal 


If the processor is in the master status when the external agent has issued an 
external request, assert the EReq signal active and wait until the processor 
deasserts the PMaster signal inactive. If the processor deasserts the PMaster 
signal inactive, the external agent acquires the bus mastership. 


Once the external agent has entered the master status, it can remain in the master 
status as long as the EReq signal is asserted active. When the EReq signal is 
deasserted, the processor acquires the bus mastership two cycles later. 


Figure 12-17 shows the arbitration protocol of the external request issued by the 
external agent. 


The following sequence describes the arbitration protocol (the numbers in the 
sequence correspond to the numbers in Figure 12-17). 
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The external agent continues asserting the EReq signal active to issue an 
external request. 


When the processor is ready to process the external request, it deasserts the 
PMaster signal inactive. 


The processor sets SysAD(31:0) and SysCmd(4:0) in the high-impedance 
state. 


The external agent should drive SysAD(31:0) and SysCmd(4:0) one cycle 
after the PMaster signal has been deasserted inactive. 


The external agent should deassert the EReq signal inactive in the last cycle 
of the external request (2 cycles before the external agent enters the slave 
status), except when it executes another external request. 


The external agent should set SysAD(31:0) and SysCmd(4:0) in the high- 
impedance state on completion of the external request. 
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Figure 12-17 Arbitration of External Request 


If the external agent has entered the master status by issuing the processor read 
request, the external agent must always return read request data. If the external 
agent has entered the master status by using the EReq signal, any command and 
data can be issued in accordance with the arbitration process. This means that the 
processor always satisfies any request from the external agent. 
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Restoring Bus Mastership by PReq Signal 


Once the external agent has entered the master status, the processor cannot stop 
the operation of the external agent. However, the processor can request bus 
mastership by asserting the PReq signal. At this time, the external agent must 
deassert the EReq signal inactive in response to the request by the processor, 
giving consideration to the priority of the mastership. 


The processor asserts the PMaster signal two cycles after the EReq signal has 
deasserted to inform the external agent that the processor has regained the bus 
mastership. 


Figure 12-18 illustrates how the processor requests the bus mastership and how 
the external agent releases the bus in response. 


At reset (when the Reset or ColdReset signal is active), the processor enters the 
master status, and the external agent enters the slave status. 
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Figure 12-18 Bus Arbitration of Processor 
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12.6.7 External Write Request Protocol 


316 


External write requests are similar in operation to a processor single write except 
that the EValid signal is asserted in place of the PValid signal. 


An external write request outputs a write command on the SysCmd(4:0) bus and 
a write address on the SysAD(31:0) bus when the processor is in slave state and 
asserting EValid signal for one cycle. This is followed by outputting a data 
identifier on the SysCmd(4:0) bus and data on the SysAD(31:0) bus and asserting 
EValid signal for one more cycle. The data identifier of the data cycle must 
contain an end of data cycle indication. 


Keep the EReq signal active while the external write request is issued. 


After the data cycle is issued, the write request is completed and the external agent 
releases the SysCmd(4:0) and SysAD(31:0) buses and allows the system 
interface to return to master state. 


An external write request with the processor generated in master state is illustrated 
in Figure 12-19. 


Figure 12-22 shows an example in which the external agent issues an external 
write request following a read response. The external write request cannot be 
issued while read response data is transferred. It can be issued before data 
response or after the last data response. 


User's Manual U10504EJ7VOUM00 


System Interface 


~«—— Master =P Slave ld Master ————_» 
scyle | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | to | a | 22 | 


cra, Ae NN el eR ae 
Sys} vo) HA. Car Data) A. 
Syscmatt3} yA (wit EOD) 


PValid SS 
(output) 


PMaster 
(output) «( / \ 


EValid ) \ / 
(input) 

Ereq 
(input) \ « i 


Figure 12-19 External Write Request Protocol 


Only an interrupt processing can be done by the processor in the external write 
request. 


12.6.8 External Read Response Protocol 


An external agent returns data to the processor in response to a processor read 
request by waiting for the processor to move to slave state, and then returning the 
data through a single data cycle or a number of data cycles sufficient for the 
requested data size. 


The SysCmd(4:0) and SysAD(31:0) buses are released after the last data cycle is 
issued. If the EReq signal is inactive at this time, the processor returns to master 
state at the end of two cycles after the last data cycle. 


The data identifier associated with a data cycle may indicate that data transferred 
during this cycle is erroneous; however, an external agent must return a specific 
data block whether or not the data is erroneous. If a read response includes one or 
more erroneous data cycles, the processor generates a bus error exception. 


Read response data can be transferred to the processor only when a processor read 
request is pending. If a read response is transferred to the processor while no 
processor read request is pending, the operation of the processor is undefined. 
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A processor single read request followed by a read response is illustrated in Figure 
12-20. A read response for a processor block read with the processor already in 
slave state is illustrated in Figure 12-21. 
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Figure 12-21 Block Read Response in Slave Status 
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Figure 12-22 shows the case where an external write request is issued following a 
read response to a processor single read request. The following sequence 
describes the protocol (the numbers in the following description correspond to the 
numbers in Figure 12-22). 


1. The external agent returns response data to the processor single read request. 


2. To issue an external request following the read response, assert the EReq 
signal active in the cycle in which EOD is returned. In this case, the PMaster 
signal remains inactive two cycles after EOD. 


3. Because the external agent is in the master status, it can issue the external 
write request. 


4. Deassert the EReq signal inactive up to the data cycle of the external write 
request. In this case, the PMaster signal is asserted active two cycles after 
EOD, and the bus mastership is returned to the processor. 
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Figure 12-22 External Write Request Following Read Response 
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Figure 12-23 shows an example in which an external write request interrupts a 
read response to a processor single read request. Cycle 5 in the figure is the write 
data for the external write request in cycle 4, and cycle 7 is the read response data. 
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Figure 12-23 When External Write Request Takes Precedence While Processor 
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Read Request is Pending 


As shown in this figure, even if the external request interrupts the processor read 
request, the processor remains in the slave status until the read response data is 
returned. 
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12.7 Successive Processing of Request 


12.7.1 Successive Processor Write Requests 


The processor write requests may be successively operated as follows. 


e In the case of data pattern “D” 
In this case, the processor write requests are processed without wait 
status as shown in Figure 12-24. 


e In the case of data pattern “Dxx” 
In this case, the processing is separated by a wait status of two cycles 
as shown in Figure 12-25. 


The processor write requests may be successively issued in the following four 
cases. 

1. Successive single write requests 

2. Successive block write requests 

3. Block write request after single write request 

4. Single write request after block write request 


For the timing of the processor single write request, refer to 12.6.3 Processor 
Write Request Protocol. 


Processor block write 2 | Processor block write 
et | 
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Figure 12-24 Successive Block Write Requests (Write Data Pattern: D) 
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Figure 12-25 Successive Single Write Requests (Write Data Pattern: Dxx) 
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12.7.2 Processor Write Request Followed by Processor Read Request 


Figure 12-26 shows the case where a processor read request follows a processor 
write request. 


~ Master phe Slave — Master ————_» 


scyle | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | to | 11 | 42 | 


ond, LOT PT Teh 


ails \ Addr (Datao\Datat (Adar) 4% (Data) 1A ( 
iis \Write { Data Y EOD YRead) 1% (Eop) "4 { 
cr / 
‘eae / \ 
input i 3 / 
a Le 


Figure 12-26 Processor Write Request Followed by Processor Read Request 
(Write Data Pattern: D) 
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12.7.3 Processor Read Request Followed by Processor Write Request 
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Figure 12-27 shows the case where a processor read request is followed by a 


processor write request. 
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Figure 12-27 Processor Single Read Request Followed by Block Write Request 


(Write Data Pattern: D) 
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12.7.4 Processor Write Request Followed by External Write Request 


Figure 12-28 shows the case where processor write requests are followed by an 
external write request. 
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Figure 12-28 Successive Processor Write Requests Followed by External Write Request 
(Write Data Pattern: D) 
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12.8 Discarding and Re-Executing Commands 


12.8.1 Re-Execution of Processor Commands 


The external agent executes and controls the processor commands by using the 


EOK signal. When the processor serves as the master, the processor cannot issue 
a command until the EOK signal is active for at least two cycles. 


If the EOK signal is active for only one cycle before the processor issues a 


command and then becomes inactive in the next cycle in which the command is 
issued, this processor command is discarded. At this time, the external agent 
should ignore the discarded command. 


If Write Command is Discarded 


The processor issues write data and then the write command again. At this time, 
the external agent should ignore the write data following the discarded write 
command. 


If Read Command is Discarded 


The processor enters the slave status in the cycle following the address cycle of a 


read request. If the EReq signal is inactive at this time, the processor returns to 


the master status again one cycle later, and reissues a read request. 


12.8.2 Discarding and Re-Executing Write Command 


Figure 12-29 illustrates how a processor single write request is discarded and re- 


executed. The following sequence describes the protocol (the numbers in the 
following description correspond to the numbers in Figure 12-29). 


i 


Because the EOK signal is active one cycle before (cycle 2) the write request 
of Data0, this cycle is the issuance cycle. 


Because the EOK signal is active in the write request cycle of DataO (cycle 
3), the next cycle is a normal data cycle. 


Because the EOK signal is active in one cycle (cycle 4) before the write 
request of Datal, this cycle is the issuance cycle. 


Because the EOK signal is inactive in the write request cycle of Datal (cycle 
5), the data of the next cycle is discarded. At this time, data/command is 
output to SysAD(31:0) and SysCmd(4:0), which should be ignored by the 
external agent. 


Because the EOK signal is inactive one cycle (cycle 6) before the write 
request of the second Datal, the write request is delayed. 
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6. Because the EOK signal is active in one cycle (cycle 9) before the write 
request of the second Datal1, this cycle is the issuance cycle. 


7. Because the EOK signal is active in the write request cycle (cycle 10) of the 
second Datal, the next cycle is a normal data cycle. 
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Figure 12-29 Discarding and Re-executing Processor Single Write Request 
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12.8.3 Discarding and Re-Executing Read Command 
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Figure 12-30 illustrates how a processor single read request is discarded and re- 
executed. The following sequence describes the protocol (the numbers in the 
following description correspond to the numbers in Figure 12-30). 


1. Because the EOK signal is low in cycle 5, the processor tries to issue an 
address (cycle 6). 


2. Ifthe EOK signal is high at this point, the processor discards this read request 
and enters the slave status in the next cycle. 


3. Because the EReq signal is inactive, the processor returns to the master status 
again and reissues a read request. Because the EOK signal is low in both the 
cycles 7 and 8, the issuance cycle of the read request is determined. 


4. The external agent outputs data at the requested address. 
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Figure 12-30 Discarding and Re-executing Processor Single Read Request 
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12.8.4 Executing and Discarding Command 


When External Agent Requests Bus Mastership 


The external agent requests the bus mastership by asserting the EReq signal 
active. At this time, the external agent can acquires the bus mastership after it has 
accepted one processor read/write request only, or without accepting any request. 


If the EReq signal is asserted active while the external agent delays the processor 
request by deasserting EOK signal inactive, the external agent can forcibly 
acquires the bus mastership. 


When Processor Requests Bus Mastership 


328 


The processor requests the bus mastership by asserting the PReq signal active. At 
this time, the external agent should transfer the bus mastership to the processor, 
giving consideration to the priority of the system. If the external agent keeps the 
ERegq signal inactive for more than one cycle, the bus is released. 


The processor acquires the bus mastership by asserting the PMaster signal active 
two cycles after the EReq signal has become inactive. If the EOK signal is active 
at this time, the processor can issue a request. 


Figure 12-31 shows an example where the external agent has entered the slave 
status (the EReq signal is inactive) from the master status, and then acquires the 
bus mastership again after accepting one processor request. 
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Figure 12-31 Discarding Bus Mastership by External Agent by Processor Request 
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12.9 Data Flow Control 


The system interface supports a maximum data rate of one word per cycle. 


Read Response 


An external agent may transfer data to the processor at the maximum data rate of 
the System interface. The rate at which data is transferred to the processor can be 
controlled by the external agent, which asserts EValid signal at the cycle which 
data is transferred. The processor accepts cycles as valid only when EValid signal 
is asserted and the SysCmd(4:0) bus contains a data identifier; thereafter, the 
processor continues to accept data until it receives the data word tagged as the last 
one. 


Data identifier EOD must be attached to the last data word. Without this, the 
System interface hangs up as a protocol error. In this case, because the protocol 
error state is identified with the PReq signal at double the cycle of SClock 
oscillating in synchronization with the MasterClock, the processor should be 
reset and initialized. 


Write Request 


The rate at which the processor transfers data to an external agent is 
programmable through the EP bit of the Config register (setting at reset is D) 
signal. Data patterns are defined using the letters D and x, where D indicates a 
data cycle and x indicates an unused cycle. For example, a Dxx data pattern 
indicates a data rate of one word every three cycles. 


The Vp4300 has two data transfer rates: D and Dxx. The processor continues 
outputting data output in the period of D immediately before, while the processor 
is in the master status and during the period of x. 


A processor block write request with a Dxx data pattern (one word every three 
cycles) is shown in Figure 12-14. 
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12.9.1 Independent Transfer on SysAD(31:0) Bus 


In general applications, the SysAD(31:0) bus is a point-to-point connection, 
running from the processor to a bidirectional register transceiver residing in an 
external agent. For these applications, the SysAD(31:0) bus has only two possible 
devices to connect, the processor or the external agent. 


Certain applications may require connection of additional drivers and receivers to 
the SysAD(31:0) bus, to allow transfers over the SysAD(31:0) bus that the 
processor is not involved in. These are called independent transfers. To effect an 
independent transfer, the external agent must coordinate mastership of the 
SysAD(31:0) bus by using arbitration handshake signals (EReq, PMaster and 
PReg signals). 


An independent transfer on the SysAD(31:0) bus follows this procedure: 


1. The external agent asserts EReq signal, and requests mastership of the 
SysAD(31:0) bus, to issue an external request. 


2. The processor deasserts PMaster signal, and releases the System interface to 


slave state. 


3. The external agent then allows the independent transfer to take place on the 
SysAD(31:0) bus, making sure that EValid signal is not asserted during the 
transfer. 


4. When the transfer is completed, the external agent deasserts EReq signal to 
return the System interface to master state. 


To connect multiple devices, separate enable signals for device to input/output are 
required to allow the non-processor chips to communicate. 


12.9.2 System Endianness 


The endianness of the system is set by the BE bit of the Config register: byte order 
is big endian when this bit is set to 1, and little endian when this bit is set to 0. 
This bit is set to 1 at cold reset. Set this bit first in the initial sequence with a little 
endian system. 


Software can set the reverse endian (RE) bit in the Status register to one to reverse 
the User mode byte ordering during operation. 
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12.10 System Interface Cycle Time 


The processor specifies minimum and maximum cycle counts for the time 
required for various processor transactions and for the processor response time to 
external requests. Processor requests themselves are constrained by the System 
interface protocol, and request cycle counts can be determined by examining the 
protocol. The following System interface interactions can vary within minimum 
and maximum cycle counts: 


* waiting period for the processor to release the System interface to 
slave state in response to an external request (release latency). 


The remainder of this section describes and tabulates the minimum and maximum 
cycle counts for these System interface interactions. 


12.10.1 Release Latency Time 


Release latency time is defined as the number of cycles the processor can wait to 
release the System interface to slave state for an external request. When no 
processor requests are in progress, internal activity can cause the processor to wait 
some number of cycles before releasing the System interface. Release latency 
time is therefore the number of cycles when EReq signal becomes active until 
PMaster signal becomes inactive. 


There are two categories of release latency time: 


¢ Category 1: when the EReq signal is asserted by one cycle before 
the last cycle of a processor request. 


* Category 2: when the EReq signal is not asserted during a processor 
request, or is asserted during the last cycle of a 
processor request. 


Table 12-2 shows the minimum and maximum release latency time for requests 
that fall into categories 1 and 2. Note that the maximum and minimum cycle 
counts are subject to change. 


Table 12-2 Release Latency Time for External Requests 


Category Minimum PCycles Maximum PCycles 
1 4 6 
2 4 24 
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12.11 System Interface Commands and Data Identifiers 


System interface commands specify the types and attributes of any System 
interface request; this specification is made during the address cycle for the 
request. 


System interface data identifiers specify the attributes of data transferred during a 
System interface data cycle. 


The following sections describe the syntax, that is, the bitwise encoding of System 
interface commands and data identifiers. 


Reserved bits and reserved fields should be set to 1 for System interface 
commands and data identifiers associated with external requests. 


For System interface commands and data identifiers associated with processor 
requests, reserved bits and reserved fields in the commands and data identifiers are 
undefined. 


12.11.1 Command and Data Identifier Syntax 


System interface commands and data identifiers are encoded in 5 bits and are 
transferred on the SysCmd(4:0) bus from the processor to an external agent, or 
from an external agent to the processor, during address and data cycles. 


Bit 4 (the most-significant bit) of the SysCmd(4:0) bus determines whether the 
current content of the SysCmd bus is a command or a data identifier and, 
therefore, whether the current cycle is an address cycle or a data cycle. For 
System interface commands, SysCmd4 must be set to 0. For System interface data 
identifiers, SysCmd4 must be set to 1. 


Bit Meaning 


SysCmd4 Attributes. 
0: Command (address) 
1: Data identifier 
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12.11.2 System Interface Command Syntax 


This section describes the SysCmd(4:0) bus encoding for System interface 
commands. Figure 12-32 shows a common encoding used for all System interface 
commands. 


Request Details 


Figure 12-32 System Interface Command Syntax Bit Definition 


SysCmd4 must be set to 0 for all System interface commands. 


SysCmd3 specify the System interface request type which may be read or write. 


Table 12-3 Encoding of SysCmd3 for System Interface Commands 


Bit Meaning 


SysCmd3 Command. 
0: Read Request 
1: Write Request 


SysCmd(2:0) are specific to each type of request and are defined in each of the 
following sections. 


12.11.3 Read Requests 


For read requests, the encoding of the SysCmd(2:0) is as follows. 


Figure 12-33 shows the format of a SysCmd read request. 


Read Request Details 


(see tables) 


Figure 12-33 Read Request SysCmd(4:0) Bus Bit Definition 
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Tables 12-4 through 12-6 list the encodings of SysCmd(2:0) bit read attributes for 
read requests. 


Table 12-4 Encoding of SysCmd2 for Read Requests 


Bit Meaning 
SysCmd2 Read Attributes. 


0: Single Read 
1: Block Read 


Table 12-5. Encoding of SysCmd(1:0) for Block Read Requests 


Bit Meaning 


SysCmd(1:0) Read Block Size. 
0: 2 words 


1: 4 words (D-cache only) 
2: 8 words (I-cache only) 
3: Reserved 


Table 12-6 Encoding of SysCmd(1:0) for Single Read Requests 


Bit Meaning 


SysCmd(1:0) Read Data Size. 

0: 1 byte valid (Byte) 

1: 2 bytes valid (Halfword) 
2: 3 bytes valid 

3: 4 bytes valid (Word) 
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12.11.4 Write Requests 


The encoding of SysCmd(2:0) for write request is shown below. 


Figure 12-34 shows the format of a SysCmd write request. 


Table 12-7 lists the write attributes encoded in bits SysCmd2. Table 12-8 lists the 
block write replacement attributes encoded in bits SysCmd(1:0). Table 12-9 lists 
the single write request encoded in bits SysCmd(1:0). 


Write Request Details 


(see tables) 


Figure 12-34 Write Request SysCmd(4:0) Bus Bit Definition 


Table 12-7 Encoding of SysCmd2 for Write Requests 


Bit 


Meaning 


SysCmd2 


Write Attributes. 
0: Single Write 
1: Block Write 


Table 12-8 Encoding of SysCmd(1:0) for Block Write Requests 


Bit 


Meaning 


SysCmd(1:0) 


Write Block Size. 


0: 2 words 

1: 4 words (for D-cache only) 

2: 8 words (for I-cache only) (for test) 
3: Reserved 


Table 12-9 Encoding of SysCmd(1:0) for Single Write Requests 


Bit 


Meaning 


SysCmd(1:0) 
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Write Data Size. 
0: 1 byte valid (Byte) 
1: 2 bytes valid (Halfword) 
2: 3 bytes valid 
3: 4 bytes valid (Word) 
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12.11.5 System Interface Data Identifier Syntax 


This section defines the encoding of the SysCmd(4:0) bus for System interface 
data identifiers. Figure 12-35 shows a common encoding used for all System 
interface data identifiers. 


response 
data 


error data 


last data 


Figure 12-35 Data Identifier SysCmd(4:0) Bus Bit Definition 


SysCmd4 must be set to | for all System interface data identifiers. 


12.11.6 Data Identifier Bit Definitions 
Bit definitions of SysCmd(3:0) are described next. 
SysCmd3 marks the last data element. 


SysCmd2 indicates whether or not the data is response data. Response data is data 
returned in response to a read request. 


SysCmd1 indicates whether or not the data element is error free. Erroneous data 
contains an uncorrectable error and is returned to the processor, resulting a bus 
error exception. Because the Vp4300 does not have a parity check function, the 
processor does not transfer data by setting the error bit to 1. 


SysCmd0 enables data check (reserved function). 

Because the Vp4300 does not have a data check function, the processor outputs 1 
(data check disable) when it transfers data. When the external agent transfers data, 
the processor ignores this bit. But set this bit to 1 to disable checking. 


Table 12-10 lists the encodings of SysCmd(3:0) for processor data identifiers. 
Table 12-11 lists the encodings of SysCmd(3:0) for external data identifiers. 
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Table 12-10 Processor Data Identifier Encoding of SysCmd(3:0) 


Bit Meaning 
SysCmd3 Last Data Element Indication. 
0: Last data element, or data element on single transfer 
1: Not the last data element 
SysCmd2 Reserved 
SysCmd1 Reserved: Error Data Indication. 
The processor outputs 0 (error free). 
SysCmd0 Reserved: Data check enabled 
Processor outputs 1 (data check disabled). 


Table 12-11 External Data Identifier Encoding of SysCmd(3:0) 


Bit Meaning 

SysCmd3 Last Data Element Indication. 

0: Last data element or data element on single transfer 

1: Not the last data element 
SysCmd2 Response Data Indication. 

0: Data is response data 

1: Data is not response data 
SysCmd1 Error Data Indication. 

0: Data is error free 

1: Data is erroneous 
SysCmd0 Reserved: Data Checking Enable. 


Processor ignores this bit. (external agent transfers 1) 
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12.12 System Interface Addresses 


System interface addresses are full 32-bit physical addresses output to the 
SysAD(31:0) bus during address cycles. 


12.12.1 Addressing Conventions 


Addresses associated with word or partial word data transfers are aligned for the 
size of the data element. The system uses the following address conventions: 


e Addresses associated with block requests are aligned to requested 
doubleword boundaries; that is, the low-order 3 bits of address are 0. 


¢ Word requests set the low-order 2 bits of address to 0. 
¢ Halfword requests set the low-order bit of address to 0. 


e Byte, tribyte requests use the byte address. 


12.12.2 Sequential and Subblock Ordering 


Sequential Ordering 


An instruction cache read request returns data in sequential order, starting with the 
first word (DWO) of the 8-word block, no matter which word is requested. 


Subblock Ordering 


When a read request is issued to the data cache, the low-order word of the 
doubleword that includes the word required by the CPU is first returned, and then 
the high-order word, the low-order word of the remaining doubleword, and the 
high-order word of it is returned in that order (for details, refer to 12.2.1 Physical 
Addresses). 
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The Vp4300 processor is provided with a boundary-scan interface that is 
compatible with Joint Test Action Group (JTAG) specifications, conforming to the 
industry-standard JTAG protocol (IEEE Standard 1149.1/D6). 


This chapter describes the functions related to JTAG interface. 


User's Manual U10504EJ7VOUM00 341 


Chapter 13 


13.1 Principles of Boundary Scanning 
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With the evolution of integrated circuits (ICs), surface-mounted devices, double- 
sided component mounting on printed-circuit boards (PCBs), and via hole 
technology, in-circuit tests connected to boards and chips have become more and 
more difficult to perform. The greater complexity of ICs has also meant that 
testing all the circuits in a chip have become much larger in size of the test pattern 
and more difficult to write. 


One solution to this difficulty has been the development of testing method using 
boundary-scan circuits. A boundary-scan circuit is shift register organization of 
a series of connected cells placed between each pin of the chip and the internal 
circuitry of the IC, as shown in Figure 13-1. In normal operation these boundary- 
scan cells are bypassed; in the test mode, however, the scan cells are directed by 
the test program to pass data along the shift register path and perform various 
diagnostic tests. To accomplish this, the tests use the four signals described in the 
next section: JTDI, JTDO, JTMS, and JTCK. 
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Figure 13-1 JTAG Boundary-Scan Cells 
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13.2 Signal Summary 


The JTAG interface signals used are listed below. 


JTDI JTAG serial data input 
JTDO JTAG serial data output 
JTMS JTAG test mode select 
JTCK JTAG serial clock input 


Caution When the JTAG interface is not used, keep the JTCK signal low. 


JTDI Pin 


Register JTDO Pin 
TAP 
Controller 56 JTMS Pin 


Boundary- 
scan 
Register 


JTCK Pin 


Figure 13-2. JTAG Interface Signals and Registers 


The JTAG boundary-scan mechanism (referred to as JTAG mechanism in this 
chapter) allows testing of the connections between the processor, the printed 
circuit board to which it is attached, and the other device on the board. 


The JTAG mechanism does not provide any capability for testing the processor 
itself. 
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13.3 JTAG Controller and Registers 


The processor contains the following registers and JTAG controller: 
e Instruction register 
¢ Boundary-scan register 
e« Bypass register 
e Test Access Port (TAP) controller 


The processor executes the standard JTAG EXTEST operation associated with 
External Test function testing. 


The basic operation of JTAG is for the TAP controller state machine to monitor 
the JTMS input signal, as shown in Table 13-1. When it starts, the TAP controller 
determines the test function to be implemented. This includes either loading an 
instruction register (IR), or beginning a serial data scan through a data register 
(DR). As the data is scanned in, the state of the JTMS pin transmits each new data 
word, and indicates the end of the data stream. The data register to be selected is 
determined by the contents of the Jnstruction register. 


13.3.1 Instruction Register 


The JTAG Instruction register includes three shift register-organization cells; this 
register is used to select the test to be performed and the test data register to be 
accessed. As listed in Table 13-1, the register value setting selects either the 
Boundary-scan register or the Bypass register. 


Table 13-1 JTAG Instruction Register Bit Encoding 


MSB..... LSB Data Register 
0 0 0 Boundary-scan register (external test only) 
01 1 Setting prohibited 
Others Bypass register 


The Jnstruction register has two stages: shift register, and parallel output latch. 
Refer to 13.3.7 Controller States for detail. Figure 13-3 shows the format of the 
Instruction register. 


2 1 0 
MSB LSB 


Figure 13-3 Instruction Register 
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13.3.2 Bypass Register 


The Bypass register is 1 bit wide. When the TAP controller is in the Shift-DR 
(Bypass) state, the data on the JTDI pin is shifted into the Bypass register, and the 
data on Bypass register output shifts to the JTDO output pin. 


Actually the Bypass register is a short-circuit which allows bypassing of board- 
level devices, in the boundary-scan chain, which do not require a specific test. 
The logical location of the Bypass register in the boundary-scan chain is shown in 
Figure 13-4. Use of the Bypass register speeds up access to boundary-scan 
registers in those ICs that remain active in the board-level test data path. 
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Figure 13-4 Bypass Register Operation 
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13.3.3 Boundary-Scan Register 
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The Boundary-scan register retains states all of the input and output pins of the 
VR4300 processor, except for some clock and phase lock loop signals. The 
external pins of the Vp4300 can be configured to drive any arbitrary pattern 
depending on scanning contents into the Boundary-scan register from the Shift- 
DR state. Incoming data to the processor is examined by shifting while in the 
Capture-DR state with the Boundary-scan register enabled. 


The Boundary-scan register is a single bus comprised of 58-bit shift registers, 
each bit of which is connected to all input and output pads one by one on the 
VR4300 processor. Figure 13-5 shows the most-significant bit of the Boundary- 
scan register; this one bit controls the output enable signals on the various 
bidirectional buses. 


57 56 0 
OEF1 


Figure 13-5 Output Enable Bit of Boundary-Scan Register 


OE1 (jSysADEn) is the JTAG output enable bit for all outputs of the processor. 
Output is enabled when this bit is set to 1 (default state). 


The remaining 57 bits correspond to 57 signal pads. Outputs are enabled when 
this bit is set to 1. 


Table 13-2 lists the scan order of these scan bits. 
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13.3.4 Test Access Port (TAP) 


The Test Access Port (TAP) consists of the four signal pins: JTDI, JTDO, JTMS, 
and JTCK. These pins control the test to be executed. 


As Figure 13-6 shows, data is serially scanned into one of the three registers 
(Instruction register, Bypass register, or the Boundary-scan register) from the 
JTDI pin, or it is scanned from one of these three registers onto the JTDO pin. 


Data is input to the JTDI pin from the least-significant bit (LSB) of the selected 
register, whereas the most-significant bit (MSB) of the selected register appears 
on the JTDO pin output. 


The JTMS signal controls the state transitions of the main TAP controller state 
machine. 


The JTCK signal is a dedicated test clock that allows serial JTAG data to be 
shifted synchronously, independent of any chip-specific or system clock. 
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Figure 13-6 JTAG Test Access Port 


The JTDI and JTMS signals are sampled in synchronization with the rising edge 
of the JTCK signal. State on the JTDO signal changes in synchronization with 
the falling edge of the JTCK signal. 


User's Manual U10504EJ7VOUM00 347 


Chapter 13 


13.3.5 TAP Controller 


The processor incorporates a 16-state TAP controller conforming to the IEEE 
JTAG standard. 


13.3.6 Controller Reset 


The TAP controller can be reset by one of the following: 
* assert the ColdReset signal 


* keep the JTMS signal asserted and input five rising edges of JTCK 
signal 


In either case, keeping JTMS signal asserted maintains the Reset state. 


13.3.7 Controller States 


The TAP controller has four states: Reset, Capture, Shift, and Update. They can 
be further classified as Shift-R state or Capture-DR state, depending on whether 
the type of signal is instruction or data. 


Reset State (TAP Controller) 


The value 0x7 is loaded into the parallel output latch, selecting the Bypass register 
as default. The most-significant bits of the Boundary-scan register is cleared to 0, 
disabling the outputs. 


Capture IR State 


The value 0x4 is loaded into the shift register stage. 


Capture DR (Boundary Scan) State 


The data currently on the processor input and I/O pins is latched into the 
Boundary-scan register. In this state, the Boundary-scan register bits 
corresponding to output pins are undefined and cannot be checked during the scan 
out processing. 


Shift IR State 


Data is loaded serially into the shift register stage of the Instruction register from 
the JTDI input pin, and the MSB of the /nstruction register’s shift register stage 
is shifted out to the JTDO pin. 
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Shift DR (Boundary Scan) State 


Data is serially shifted into the Boundary-scan register from the JTDI pin, and the 
contents of the Boundary-scan register are serially shifted onto the JTDO pin. 


Update IR State 


The current data in the shift register stage is loaded into the parallel output latch. 


Update DR (Boundary Scan) State 


Data in the Boundary-scan register is latched into the register parallel output latch. 
Bits corresponding to output pins, and those I/O pins whose outputs are enabled 
by the MSB (OE1) of the Boundary-scan register, are loaded onto the processor 
pins. 


Table 13-2 shows the boundary scan order of the processor signals. 


Table 13-2. JTAG Scan Order 


No. | Signal Name | No. | Signal Name | No. Signal Name No. | Signal Name 
1 SysAD4 16 SysAD26 31 SysAD23 46 SysAD14 
2 SysAD3 17 | PMaster 32 | InB 47 SysAD13 
3 SysAD2 18 SysAD25 33 SysAD22 48 SysAD12 
4 SysAD1 19 EReq 34 SysAD21 49 SysAD11 
5 SysADO 20 SysCmd0 35 SysAD20 50 SysAD10 
6 PReq 21 SysCmd1 36 | RFU (Input: always 1) 51 | IntO 
7 SysAD31 22 Reset 37 RFU (Input: always 1) 52 SysAD9 
8 PValid 23 EValid 38 TClock 53 SysAD8 
9 SysAD30 24 SysCmd2 39 SyncOut 54 SysAD7 
10 EOK 25 SysCmd3 40 SysAD19 55 SysAD6 
11 SysAD29 26 | ColdReset 41 SysAD18 56 SysADS5 
12 | SysAD28 27 | SysCmd4 42 | SysAD17 57. | Intl 
13 SysAD27 28 DivModel 43 Int4 58 jSysADEn 
14 | Int2 29 SysAD24 44 SysAD16 
15 | NMI 30 | DivModeO 45 SysAD15 
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13.4 Notes on Implementation 
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This section describes points to be noted of JTAG boundary-scan operation that 


are specific to the processor. 


The MasterClock, SyncIn, and SyncOut signal pads do not support 
JTAG. 


The update function occurs on the falling edge of JTCK signal after 
the TAP controller enters the Update-DR state. This conforms to the 
IEEE standard. 


The Vp4200 generates the update function at the next rising edge. In 
other words, it is 1/2JTCK cycle late as compared with the Vp4300. 
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Four types of interrupt are available on the Vp4300. These are: 
* one non-maskable interrupt, NMI 
e — five external normal interrupts 
* — two software interrupts 


* one timer interrupt 


These are described in this chapter. 
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14.1 Non-Maskable Interrupt 
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The non-maskable interrupt request is accepted by asserting the NMI signal (low), 
forcing the processor to branch to the Reset Exception vector. NMI signal is 
latched into an internal register in synchronization with the rising edge of SClock 
signal, as shown in Figure 14-1. The NMI signal is edge-triggered, and NMI 
request is acknowledged when the NMI signal is kept low for more than one 
cycle. This signal must be high after an exception occurs. An NMI request can 
also be set by an external write request through the SysAD(31:0) bus. On the data 
cycle, SysAD6 acts as the NMI request bit (1:requested) and SysAD22 acts as the 
write enable bit (1:enable) for SysAD6. 


NMI only takes effect when the processor pipeline is running. Thus NMI can be 
used to recover the processor from a software hang up (for example, in an infinite 
loop) but cannot be used to recover the processor from a hardware hang up (for 
example, no read response from an external device). NMI cannot cause drive 
contention on the SysAD(31:0) bus and no reset of external agents is required. 


This interrupt cannot be masked. 


Figure 14-1 shows the internal processing of the NMI signal. The low-level signal 
input to NMI pin is latched into an internal register in synchronization with the 
rising edge of SClock. Bit 6 of the internal register is then ORed with the inverted 
value of latched NMI signal to transfer internally as the non-maskable interrupt 
request. 
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Figure 14-1 NMI Signal 


14.2 External Normal Interrupts 


These interrupt requests are accepted by asserting Int(4:0) signal (ow). Int(4:0) 
signals are level-triggered, and these signals must be kept low until an external 
interrupt exception is generated. After an external interrupt exception occurs, 
Int(4:0) signal must be high before the processor returns to its normal routine, or 
before multiple interrupts are enabled. This interrupt request can be set by an 
external write request through the SysAD(31:0) bus. During the data cycle, 
SysAD(4:0) acts as the external interrupt request bit (1:requested) and 
SysAD(20:16) acts as the write enable bit (1:enable) for SysAD(4:0). 


After an external interrupt exception occurs, an external write request must be 
issued to clear the corresponding bit of the interrupt register to 0 before the 
processor returns to its normal routine, or before multiple interrupts are enabled. 


These interrupt requests can be masked with the JM(6:2), JE, EXL, and ERL fields 
of the Status register. 
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14.3 Software Interrupts 


These interrupt requests are accepted by setting bit 1 or 0 of the interrupt pending, 
IP, field in the Cause register to 1. These bits can be written by software, but there 
is no hardware mechanism to set or clear these bits. 


After a software interrupt exception occurs, the corresponding bit of the /P field 
in the Cause register must be cleared to 0 before the processor returns to its normal 
routine, or before multiple interrupts are enabled. 


These interrupt requests are maskable with the JM(1:0), IE, EXL, and ERL fields 
of the Status register. 


14.4 Timer Interrupt 


These interrupt requests use bit 7 of the IP (interrupt pending) field in the Cause 
register. The timer interrupt is automatically set and accepted whenever the value 
of the Count register equals the value of the Compare register. 


To clear this interrupt request, either clear the JP7 bit of the Cause register, or 
change the contents of the Compare register. 


This interrupt request is maskable through the /M7 bit and JE, EXL and ERL fields 
of the Status register. 


14.5 Generation of Interrupt Request Signal 
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When an external agent issues an external write request, it is written to the 
Interrupt register. This register can be used in an external write cycle, but not in 
an external read cycle. 


When data is written to the /nterrupt register, the processor ignores the address 
issued by the external agent. 


This register cannot be read or written by software unlike the CPO register. 


In the data cycle, bits SysAD20 through SysAD16 are used as individual write 
enable bits corresponding to the 5 bits of the Interrupt register. The values 
SysAD4 through SysAD0 are written to the bits of the Interrupt register. 
Therefore, the bits 0 through 4 of the /nterrupt register can be set or cleared by 
issuing an external write request only once. Figure 14-2 illustrates this along with 
the NMI described earlier. 
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SysAD(4:0) 


Interrupt Set Value 


4 3 


2 1 0 


Interrupts 


Interrupt Register 


0 
1 
< 2 Refer to Figures 
|__§_ > 14-3 and 14-4. 
3 
SysAD(20:16) 
Write Enables 6 
» Refer to Figure 14-1. 
SysAD6 
6 
Nonmaskable Interrupt 
22 
SysAD22 
Bit Meaning Setting 
SysAD(4:0) External interruptrequest | 1 : requested 
Int (4:0) 0 : no request 
(for each bit) 
SysAD(20:16) | Write enable bits for : enable 
SysAD(4:0) 0: disable 
(for each bit) 
SysAD6 NMI 1 : requested 
0 : no request 
SysAD22 Write enable bit for : enable 
SysAD6 0: disable 


Figure 14-2 Interrupt Register Bits and Enables Bits 
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Chapter 14 


14.5.1 Detection of Hardware Interrupts 


Figure 14-3 shows how the Vp4300 hardware interrupt causes are detected 
through the Cause register. 


¢ The timer interrupt signal, JP7, is directly detected as bit 15 of the 
Cause register. 


e The other hardware interrupt signals are directly detected since bits 
4:0 of the Interrupt register are ORed one by one with each signal of 
the interrupt pins Int(4:0) and the result is input to bits 14:10 of the 
Cause register. 


IP(1:0) of the Cause register are related to software interrupts. (Refer to Chapter 
6 Exception Processing for detail.) There is no hardware mechanism for setting 
or clearing the software interrupts. 


Interrupt Register (4:0) 


Timer Interrupt 


Cause Register 
(15:10) 


(Internal Register) 


Int3 Inti 
Int4 Int2 Into 


Figure 14-3 Hardware Interrupt Request Signals 
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14.5.2 Masking of Interrupt Request Signals 


Figure 14-4 shows the masking of the Vp4300 interrupt request signals. 


* Cause register bits 15:8 (IP7-IPO) are AND-ORed with Status register 
interrupt mask bits 15:8 (IM7-IMO) to mask individual interrupt 
signals. 


¢ Status register bit 0 is a global Interrupt Enable (IE) bit. The output 
of this bit is ANDed with the output of the AND-OR logic block to 
produce the Vp4300 interrupt signal as shown in Figure 14-4. The 
EXL bit in the Status register also enables these interrupts. 


Status Register 
SRO 


Status Register 
SR(15:8 


BEE 
hoy +|O 


= 
es) 


555 
OO} 


IM7 


VR4300 Interrupt 
1 


Software 
Interrupts 


External Normal 
Interrupts 


Cause Register 


(15:8) 
Bit Meaning Setting 
IE Enable all interrupts 1: enable 
0: disable 
IM(7:0) Mask interrupts 1: enable 
0: disable 
(for each bit) 
IP(7:0 1 : request pending 
(7:0) Interrupt requests Ocaopending 
(for each bit) 


Figure 14-4 Masking of Interrupt Requests 
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15 


One of the objectives of the design of the Vp4300 processor is to minimize power 
consumption in order to make the processor suitable for use in battery operated 
systems, as well as in environments where low power consumption and heat 
dissipation are desirable. 


To accomplish this, the Vp4300 has power management features which bring a 
dynamic reduction of power consumption, described in this chapter. 
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15.1 Features 


The Vp4300 has three processor-level operation modes: normal, low power (100 
MHz model of the Vp4300 and the Vp4305 only), and power off. 


These modes allow processor power consumption to be managed by system logic. 


Generally a notebook system has many different levels of power management. It 
is the responsibility of system logic to switch the processor between the three 
available modes in order to reflect the power management state of the system. 


15.1.1 Normal Power Mode 


The normal pipeline clock (PClock) is generated based on the input clock 
(MasterClock). The ratio of the frequency of PClock to that of MasterClock is 
set by the DivMode(1:0)* pins. For the details of setting, refer to 2.2.2 Clock/ 
Control Interface Signals. 


The frequency of the system interface clock (SClock) is the same as that of 
MasterClock. 


The processor operates in the normal mode as default condition. The processor 
enters the default status after reset. 


* Tn Vp4300 and Vp4305. In Vp4310, DivMode(2:0). 


15.1.2 Low Power Mode 
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The low power mode is supported only in the 100 MHz model of the Vp4300 and 
the Vp4305. 


The processor operates in the low power mode when the RP bit of the Status 
register is set. In this mode, the processor once stalls the pipeline, entering the 
quiescent status. In this status, the store buffer becomes empty, and all cache 
misses are processed. 


The frequency of PClock drops to the 1/4 of the normal level. The speeds of 
SClock and TClock also drop to the 1/4 of the normal level. 


Example When DivMode (1:0) = 10 in 100 MHz model of the Vp4300 


MasterClock PClock SClock, TClock 
Normal mode 50 MHz 100 MHz 50 MHz 
Low power mode 50 MHz 25 MHz 12.5 MHz 
The low power mode can reduce the power consumption of the processor to about 


25% of the normal level. When setting or clearing the RP bit, guarantee the 
normal operation of the system by software. 
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Also keep in mind the following points. 


1. The functions of circuits such as the DRAM refresh counter change if the 
operating frequency changes. Consequently, first write new values to the 
registers of the external agent that are directly affected by changes in the 
frequency. 

2. Make sure that the operation of the system interface is inactive. For example, 
execute an instruction that reads the non-cache area, and vacate the write/ 
buffer after execution of the instruction. After that, the RP bit can be set or 
cleared. 


3. Make sure that eight instructions before and after the MTCO instruction that 
sets or clears the RP bit do not cause an exception such as cache miss or TLB 
miss exception. 


15.1.3 Power Off Mode 


In the power off mode, power supply to the processor is entirely cut off and 
operation of the processor stops completely. 


Before entering power off mode, the state of the processor is written to non- 
volatile memory. When the processor returns to the normal mode, all registers are 
restored to their previous state. 


In order to support power off mode, all internal state information necessary for 
restoring the processor from the state of power off is read and write accessible. 
Prior to power off, this information must be saved into non-volatile memory 
connected externally. 


It is the system’s responsibility to power off the chip when the system is in idle 
state. At this time the Load Link LL bit is not required to be saved since it is 
automatically cleared by the cache start-up. 


Cache content is not retained, and therefore the cache should be invalidated during 
the power-on routine and written back to the memory during the power-off 
routine. The Vp4300 chip supports the CACHE instructions and TLB operation 
instructions which invalidate all caches and TLB contents. 
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16 


This chapter provides a detailed description of the function of each Vp4300 CPU 
instruction in both 32- and 64-bit modes. The instructions are listed in 
alphabetical order. 


For details of the FPU instruction set, refer to Chapter 17 FPU Instruction Set 
Details. 
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16.1 Instruction Notation Conventions 
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In this chapter, all variable subfields in an instruction format (such as rs, rt, 
immediate, etc.) are shown in lowercase characters. Instruction names (such as 
ADD, SUB, etc.) are shown in upper case characters. For the sake of clarity, 
sometimes an alias is used for a subfield in the specific instructions. For example, 
we use rs = base for load and store instructions. Such an alias is always lower 
case characters, since it also refers to a subfield. 


The actual encoding for all the mnemonics are located in 16.7 CPU Instruction 
Opcode Bit Encoding, and the bit encoding also accompanies each instruction 
description. 


In the instruction descriptions, the Operation section describes the operation 
performed by each instruction using a high-level language notation. The Vp4300 
can operate in either 32- or 64-bit mode. Differences in operations in each mode 
are shown in operation section. Special symbols used in the notation are described 
in Table 16-1. 
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Table 16-1 CPU Instruction Operation Notations 


Symbol Meaning 
<— Substitution 
\ Bit string concatenation. 
x Repetition of bit string x with a y-bit string. xis always a single-bit value. 
Xyz Selection of bits y through z for bit string x. 
Little-endian bit notation is always used. If yis less than z, this expression is 
an empty (zero length) bit string. 
+ 2’s complement or floating-point addition. 
- 2’s complement or floating-point subtraction. 
: 2’s complement or floating-point multiplication. 
div 2’s complement integer division. 
mod 2’s complement remainder. 
/ Floating-point division. 
< 2’s complement less than comparison. 
and Bit-wise logical AND. 
or Bit-wise logical OR. 
xor Bit-wise logical XOR. 
nor Bit-wise logical NOR. 
GPR[x] General Purpose Register x. The content of GPR[0] is always zero. 
Attempts to alter the content of GPR[0] have no effect. 
CPR[z,x] Coprocessor unit Z, general purpose register x. 
CCR[z,x] Coprocessor unit z, control register x. 
COC[zZ] Coprocessor unit Z, condition signal. 
BigEndianMem | Endian mode as configured at reset (0 — Little, 1 — Big). 


Specifies the endianness of the memory interface (see LoadMemory and 
StoreMemory), and the endianness of Kernel and Supervisor modes. 


ReverseEndian 


Signal to reverse the endianness of load and store instructions. 

This feature is available in User mode only, and is effected by setting the RE 
bit of the Status register. Thus, ReverseEndian is set to 1 only when the RE 
bitis set in User mode. 


BigEndianCPU | The endianness for load and store instructions (0 — Little, 1 — Big). 
In User mode, this endianness is reversed by setting RE bit. Thus, 
BigEndianCPU is calculated as BigEndianMem XOR ReverseEndian. 
LLbit Bit showing synchronized state of instructions. Set by LL instruction, cleared 
by ERET instruction and read by SC instruction. 
T+h Indicates the time steps between operations. Each statement within a time 


step are defined to be executed in sequential order (instruction execution 
order may be changed by conditional branch and loop). 

Operations which are marked 7+i: are executed at instruction cycle /from the 
start of execution of the instruction. Thus, an instruction which starts at time / 
executes operations marked T+/: at time of / + jth cycle. The order is not 
defined for instructions executed at the same time or operations. 
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Instruction Notation Examples 


The following are examples of the instruction notations: 


Example #1: 
GPR[rt] <— immediate || 0'° 


Sixteen zero bits are concatenated with a low-order immediate 
value (normally 16 bits), and the 32-bit string is substituted to 
CPU General Purpose Register rt. 


Example #2: 


(immediate,5) ‘© || immediate;5 0 


Bit 15 (the sign bit) of an immediate value is extended by 
16 bit positions, and the result is concatenated with bits 15 
through 0 of the immediate value to generate a 32-bit sign 
extended value. 
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16.2 Load and Store Instructions 


In the Vp4300, the instruction immediately following a load instruction may use 
the loaded register contents. In such cases, the hardware interlocks by 1PCycle 
only, so scheduling load delay slots is desirable to improve performance, although 
not required as a functional code. 


Two special instructions are provided in the Vp4300 implementation of the MIPS 
ISA, Load Link and Conditional Store Instructions. These instructions are used 
in carefully coded sequences to execute one of several synchronization primitives, 
including test-and-set, bit-level locks, semaphores, and sequencers/event counter, 
etc. This synchronization is essential in multi-processor systems. This 
functionality is included in the Vp4300 primarily for reasons to keep 
compatibility with the Vp4000 and Vp4200. 


In the load and store instruction descriptions, the functions listed below are used 
to simplify the handling of virtual addresses and physical memory. 


Table 16-2. Load and Store Instruction Common Functions 


Function Meaning 


Uses TLB to search a physical address from a virtual address. If 
AddressTranslation TLB does not have the requested contents of conversion, this 
function fails, and TLB non-coincidence exception occurs. 


Searches the cache and main memory to search for the contents 
of the specified data length stored in a specified physical address. 
If the specified data length is less than a word, the contents of a 
data position taking the endian mode and reverse endian mode of 
the processor into consideration are loaded. The low-order 3 bits 
and access type field of the address determine the data position in 
a data word. The data is loaded to the cache if the cache is 
enabled. 


LoadMemory 


Searches the cache, write buffer, and main memory to store the 

contents of a specified data length to a specified physical address. 
If the specified data length is less than a word, the contents of a 
StoreMemory data position taking the endian mode and reverse endian mode of 
the processor into consideration are stored. The low-order 3 bits 
and access type field of the address determine the data position in 
a data word. 
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The Access Type field indicates the size of the data to be loaded or stored. 
Regardless of access type or byte order (endianness), the address specifies the byte 
which has the smallest byte address in the field accessed. For a big-endian system, 
this is the leftmost byte and contains the sign for a 2’s complement value; for a 
little-endian system, this is the rightmost byte. 


Table 16-3 Access Type Specifications for Load/Store Instructions 


Access Type SysCmd(2:0) Meaning 
DOUBLEWORD 7 8 bytes (64 bits) 
SEPTIBYTE 6 7 bytes (56 bits) 
SEXTIBYTE 5 6 bytes (48 bits) 
QUINTIBYTE 4 5 bytes (40 bits) 
WORD 3 4 bytes (32 bits) 
TRIPLEBYTE 2 3 bytes (24 bits) 
HALFWORD 1 2 bytes (16 bits) 
BYTE 0 1 byte (8 bits) 


The bytes within the accessed doubleword can be determined directly from the 
access type and the low-order three bits of the address. 
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16.3. Jump and Branch Instructions 


All jump and branch instructions have structural delay of exactly one instruction. 
That is, the instruction immediately following a jump or branch instruction (that 
is, occupying the delay slot) is executed while the target instruction is being 
fetched from the cache. A jump or branch instruction cannot be used in a delay 
slot; however, if they are used, the error is not detected and the results of such an 
operation are undefined. 


If an exception or interrupt prevents the completion of the instruction during it is 
in a delay slot, the hardware sets a virtual address to the EPC register at the point 
of the jump or branch instruction that precedes it. When processing exceptions or 
interrupts is completed and the program is restored, both the jump or branch 
instruction and the instruction in the delay slot are reexecuted. 


Because jump and branch instructions may be reexecuted after exception or 
interrupt processing, register 3/ (the register in which the link address is stored) 
should not be used as a source register in jump and link/branch and link 
instructions. 


Since instructions must be word-aligned, a Jump Register or Jump and Link 
Register instruction must use a register which contains an address whose low- 
order two bits are zero. If these low-order two bits are not zero, an address 
exception will occur when the jump destination instruction is fetched. 


16.4 Coprocessor Instructions 


Coprocessors are alternate execution units, which have register files separate from 
the CPU. The MIPS architecture provides four coprocessor units and these 
coprocessors have two register spaces, each space containing thirty-two 32-bit 
registers. 


e The first space, coprocessor general purpose registers, is directly 
loaded from and stored into the main memory, and their contents can 
be transferred between the coprocessor and processor. 


e The second space, coprocessor control registers, can only have their 
contents transferred between the coprocessor and the processor. 
Coprocessor instructions may alter registers in either space. 
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16.5 System Control Coprocessor (CP0) Instructions 


There are some limitations imposed on operations involving CPO that is 
incorporated within the CPU. Although load and store instructions to transfer data 
to/from coprocessors and to exchange control codes to/from coprocessor 
instructions are generally permitted by the MIPS architecture, CPO is given a 
somewhat protected status since it has responsibility for exception handling and 
memory management. Therefore, the coprocessor transfer instructions are the 
only valid way for writing to and reading from the CPO registers. 


Some CPO instructions are defined to directly read, write, and probe TLB entries 
and to change the operating modes in preparation for restoring to User mode or 
interrupt-enabled states. 


16.6 CPU Instructions 
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This section describes in detail each function of CPU instructions in 32- or 64-bit 
mode. 


Possible exceptions, which may occur are caused by instruction execution, and are 
explained at the end of the description for each instruction. Refer to Chapter 6 
Exception Processing for details of exceptions and their processing. 
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Add ADD 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 ADD 
000000 00000 100000 
6 5 5 5 5 6 
Format: 
ADD td, rs, rt 
Description: 
The contents of general purpose register rs and the contents of general purpose 
register rt are added to store the result in general purpose register rd. In 64-bit 
mode, the operands must be sign-extended, 32-bit values. 
An integer overflow exception occurs if the carries out of bits 30 and 31 differ (2’s 
complement overflow). The contents of destination register rd is not modified 
when an integer overflow exception occurs. 
Operation: 
32 TT: GPR{[rd] <GPR[rs] + GPR[r] 
64 TT: temp < GPRi[rs] + GPR[rt] 
GPR[rd] < (tempgi)** || tempsy._.0 
Exceptions: 


Integer overflow exception 
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ADDI Add Immediate ADDI 


31 26 25 21 20 16 15 0 
ADDI rs rt immediate 
001000 
6 5 5 16 
Format: 


ADDI rt, rs, immediate 


Description: 


The 16-bit immediate is sign-extended and added to the contents of general 
purpose register rs to store the result in general purpose register rt. In 64-bit 
mode, the operand must be sign-extended, 32-bit values. 


An integer overflow exception occurs if carries out of bits 30 and 31 differ (2’s 
complement overflow). The contents of destination register rt is not modified 
when an integer overflow exception occurs. 


Operation: 


32 T: GPR [rt] — GPR[rs] +(immediate;s)'® || immediate;5 0 


64 T: temp < GPRI[rs] + (immediate,s)*° || immediate;s 9 
GPR([rt] — (temp31)°* || temp3y._.0 


Exceptions: 


Integer overflow exception 
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ADDIU Add Immediate Unsigned ADPDDIU 


31 26 25 21 20 16 15 0 
ADDIU Is rt immediate 
001001 
6 5 5 16 


Format: 
ADDIU tt, rs, immediate 


Description: 


The 16-bit immediate is sign-extended and added to the contents of general 
purpose register rs to store the result in general purpose register rt. No integer 
overflow exception occurs under any circumstance. In 64-bit mode, the operand 
must be sign-extended, 32-bit values. 


The only difference between this instruction and the ADDI instruction is that 
ADDIU instruction never causes an integer overflow exception. 


Operation: 


32 T: GPR [rt] — GPR[rs] + (immediate,s)'® || immediate;s 9 


64. T: temp < GPR{[rs] + (immediate,;)*° || immediate;5 
GPRI[rt] — (temp31)** || tempsy._.0 


Exceptions: 


None 
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ADDU Add Unsigned ADDU 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 ADDU 
000000 00000 100001 
6 5 5 5 5 6 

Format: 


ADDU rd, rs, rt 


Description: 


The contents of general purpose register rs and the contents of general purpose 
register rt are added to store the result in general purpose register rd. No integer 
overflow exception occurs under any circumstance. In 64-bit mode, the operands 
must be sign-extended, 32-bit values. 


The only difference between this instruction and the ADD instruction is that 
ADDU instruction never causes an integer overflow exception. 


Operation: 


32 T:  GPR[rd] —GPR[rs] + GPR[rt] 


64 TT: temp < GPRi[rs] + GPR[rt] 
GPR[rd] <— (temp3;)°? || temp3y..o 


Exceptions: 


None 
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AND And AND 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 AND 
000000 00000 100100 

6 5 5 5 5 6 


Format: 
AND td, rs, rt 


Description: 


The contents of general purpose register rs are combined with the contents of 
general purpose register rt in a bit-wise logical AND operation. The result is 
stored in general purpose register rd. 


Operation: 


32  T:  GPR{[rd] < GPR[rs] and GPR[rt] 


64  T:  GPR[rd] < GPR[rs] and GPRirt] 


Exceptions: 


None 
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ANDI And Immediate ANDI 


31 26 25 21 20 16 15 0 


ANDI rs rt immediate 
001100 


6 5 5 16 


Format: 


ANDI rt, rs, immediate 


Description: 


The 16-bit immediate is zero-extended and combined with the contents of general 
purpose register rs in a bit-wise logical AND operation. The result is stored in 
general purpose register 7t. 


Operation: 


32 T: GPR{[rt] —0'°|| (immediate and GPR[rs]15_ 0) 


64 T: GPR{[rt] — 048 || (immediate and GPR[rs]}5. 0) 


Exceptions: 


None 
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BCzZF Branch On Coprocessor z False BCzZF 


31 26 25 21 20 1615 0 
COPz BC BCF offset 
0100xx* 01000 00000 
6 5 5 16 

Format: 
BCzF offset 

Description: 
A branch address is calculated from the sum of the address of the instruction in the 
delay slot and the 16-bit offset, shifted two bits left and sign-extended. If CPz’s 
condition signal (CpCond), as sampled during the previous instruction execution, 
is false, then the program branches to the branch address with a delay of one 
instruction. 
Because the condition signal is sampled during the previous instruction execution, 
there must be at least one instruction between this instruction and a coprocessor 
instruction that changes the condition signal. 

Operation: 

32 T-1: condition — not COC[z] 

T: target < (offset,s)'* || offset || 0° 
T+1: if condition then 
PC < PC + target 
endif 
64  T-1: condition — not COC[z] 


T: target < (offset,s)“© || offset || 0° 
T+1: if condition then 
PC < PC + target 
endif 


* Refer to the table Opcode Bit Encoding on the next page, or 16.7 CPU 
Instruction Opcode Bit Encoding. 


User’s Manual U10504EJ7VOUM00 377 


Chapter 16 


BCZF Branch On Coprocessor z False BCzZF 


(continued) 


Exceptions: 


Coprocessor unusable exception 


Opcode Bit Encoding: 


BCzF Bit #31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0 
BCOF}/0;/1)/0;0/0)0)0;1;0);0;0/;0);0;0/0)0 


Bit #31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0 
BC1F/0}1;/;0';0;0/1/;0/;1/0);0/0/0 


Bit #31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0 
BC2F,/0/;1)/0;0/1)/0)/0;1;0);0;0;/0);0;0/0)0 


Opcode BC Sub-opcode Branch Condition 


Coprocessor Number 
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eB Cz FE L Baars z B Cz F L 


31 26 25 21 20 1615 0 
COPz BC BCFL offset 
0100xx* 01000 00010 
6 5 5 16 
Format: 


BCZFL offset 


Description: 
A branch address is calculated from the sum of the address of the instruction in the 
delay slot and the 16-bit offset, shifted two bits left and sign-extended. If the 
CPz’s condition signal (CpCond), as sampled during the previous instruction 
execution, is false, the program branches to the branch address with a delay of one 
instruction. 


If it does not branch, the instruction in the branch delay slot is discarded. 


Because the condition signal is sampled during the previous instruction execution, 
there must be at least one instruction between this instruction and a coprocessor 
instruction that changes the condition signal. 


* Refer to the table Opcode Bit Encoding on the next page, or 
16.7 CPU Instruction Opcode Bit Encoding. 
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B h Onc 
BCzFL raise Likely BOZFL 


(continued) 


Operation: 


32 T-1: condition — not COC[z] 
T: target < (offset,<)'4 || offset || 0° 
T+1: if condition then 
PC < PC + target 


NullifyCurrentinstruction 


else 


endif 


64 T-1: condition — not COC[z] 
T: target < (offset,s)° || offset || 0° 
T+1: if condition then 
PC < PC + target 


NullifyCurrentlinstruction 


Exceptions: 


Coprocessor unusable exception 


Opcode Bit Encoding: 


BCZFL Bit #31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0 
BCOFL/0}1/;0/;0)0;0/0)1/)/0;0;0)0}0/0)11j0 


Bit #31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0 
BC1FL|0|1)/0)/0)/0)1,'0);1;/;0'/0);0;/;0;0;0/;1 


Bit #31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0 
BC2FL|0}1/0)/0)1)/0'0);1;/;0'/0;0;0;0;0/;1 


Aw 


Opcode BC Sub-opcode Branch Condition 


Coprocessor Number 
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BCzT Branch OnCoprocessorzTrue BCzT 


31 26 25 21 20 1615 0 
COPz BC BCT offset 
0100xx* 01000 00001 
6 5 5 16 


Format: 
BCzT offset 


Description: 
A branch address is calculated from the sum of the address of the instruction in the 
delay slot and the 16-bit offset, shifted two bits left and sign-extended. If the 
CPz’s condition signal (CpCond) sampled during the previous instruction 
execution is true, then the program branches to the branch address with a delay of 
one instruction. 
Because the condition signal is sampled during the previous instruction execution, 
there must be at least one instruction between this instruction and a coprocessor 
instruction that changes the condition signal. 


Operation: 


32 T-1: condition — COC[z] 
T: target < (offset,s)'* || offset || 07 
T+1: if condition then 
PC < PC + target 
endif 
64  T-1: condition — COC[z] 
T: target < (offset,s)*° || offset || 07 


T+1: if condition then 
PC < PC + target 


endif 


* Refer to the table Opcode Bit Encoding on the next page, or 
16.7 CPU Instruction Opcode Bit Encoding. 
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BCzT Branch On Coprocessor z True BCzT 


(continued) 


Exceptions: 


Coprocessor unusable exception 


Opcode Bit Encoding: 


BCzT Bit #31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0 
BCOT;}0;/1)/0;0;0,'0)0;/1/;0);0}0/0);0)0|0) 1 


Bit #31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0 
BCi1T;/0,/1;0;0/;0/1/;0/1;0;0/0;0;0;0}]0)1 


Bit #31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0 
BC2T;/0,/1;0/;0/;1/;0/;0/1;0;0/0;0;0;0}]0)1 


Opcode BC Sub-opcode Branch Condition 


Coprocessor Number 
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BCzTL  wrenpocepeye"* — BCZTL 


31 26 25 21 20 1615 0 
COPz BC BCTL offset 
0100xx* 01000 00011 
6 5 5 16 


Format: 
BCzTL offset 


Description: 


A branch address is calculated from the sum of the address of the instruction in the 
delay slot and the 16-bit offset, shifted two bits left and sign-extended. If the 
CPz’s condition signal (CpCond), as sampled during the previous instruction 
execution, is true, the program branches to the branch address with a delay of one 
instruction. 


If it does not branch, the instruction in the branch delay slot is discarded. 


Because the condition signal is sampled during the previous instruction execution, 
there must be at least one instruction between this instruction and a coprocessor 
instruction that changes the condition signal. 


Operation: 

32 T-1: condition — COC[z] 
T: target < (offset,s)'* || offset || 0° 
T+1: if condition then 


PC < PC + target 
NullifyCurrentinstruction 


else 


endif 
64  T-1: condition — COC. 
T: — target < (offset,s)*>|| offset || 07 
T+1: if condition then 
PC < PC + target 


NullifyCurrentinstruction 


* Refer to the table Opcode Bit Encoding on the next page, 
or 16.7 CPU Instruction Opcode Bit Encoding. 
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BCZTL Branch te chen Zz BCZTL 


(continued) 


Exceptions: 


Coprocessor unusable exception 


Opcode Bit Encoding: 


BCzTL Bit #31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0 
BCOTL|0/1/;0,0)/0/;0/;0)1);0/;0);0);0/0/;0)1)}1 


Bit #31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0 
BC1TL|/0/1/;0,0)/0/1/;0)1);0/;0;0);0;/0/;0)1)}1 


Bit #31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0 
BC2TL|0/1/;0,0)/1/;0/;0)1);0/;0);0);0;/0/;0)1)}1 


Opcode BC Sub-opcode Branch Condition 


Coprocessor Number 
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CPU Instruction Set Details 


Branch On Equal B EQ 


31 26 25 21 20 16 15 0 
BEQ rs rt offset 
000100 
6 5 5 16 
Format: 


BEQ rs, rt, offset 


Description: 


A branch address is calculated from the sum of the address of the instruction in the 
delay slot and the 16-bit offset, shifted two bits left and sign-extended. The 
contents of general purpose register rs and the contents of general purpose register 
rt are compared. If the two registers are equal, then the program branches to the 
branch address with a delay of one instruction. 


Operation: 


32 T: 


T+1: if condition then 


64 T: 


T+1: if condition then 


target < (offset,<)'4 || offset || 07 
condition < (GPR[rs] = GPR[rt]) 


PC < PC + target 
endif 
target <— (offset,5)*° || offset || 07 


condition < (GPR[rs] = GPR[rt]) 


PC < PC + target 


endif 


Exceptions: 


None 
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BEQL 


Branch On Equal Likely 


31 26 25 21 20 16 15 0 
BEQL rs rt offset 
010100 
6 5 5 16 
Format: 


BEQL rs, rt, offset 


Description 


A branch address is calculated from the sum of the address of the instruction in the 
delay slot and the 16-bit offset, shifted two bits left and sign-extended. The 

contents of general purpose register rs and the contents of general purpose register 
rt are compared. If the two registers are equal, the program branches to the branch 
address with a delay of one instruction. 


If it does not branch, the instruction in the branch delay slot is discarded. 


Operation: 


32 T: 


T+1: 


target < (offset,s)'4 || offset || 0° 
condition — (GPR[rs] = GPR[rt]) 


: if condition then 


PC < PC + target 
NullifyCurrentinstruction 


else 


endif 

target < (offset,s)“© || offset || 0° 
condition <— (GPR[rs] = GPR[rt]) 
if condition then 


PC < PC + target 
else 

NullifyCurrentinstruction 
endif 


Exceptions: 


None 
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Branch On Greater Than 
BG EZ Or Equal To Zero BG EZ 


31 26 25 21 20 16 15 0 
REGIMM rs BGEZ offset 
000001 00001 
6 5 5 16 


Format: 
BGEZ rs, offset 


Description: 


A branch address is calculated from the sum of the address of the instruction in the 
delay slot and the 16-bit offset, shifted two bits left and sign-extended. If the 
contents of general purpose register rs are equal to or larger than 0, then the 
program branches to the branch address with a delay of one instruction. 


Operation: 


32 TT: target < (offset,s)'* || offset || 0° 
condition < (GPR[rs]3; = 0) 
T+1: if condition then 
PC < PC + target 


endif 
64 T: target < (offset;;)*° || offset || 0° 
condition <— (GPRirs]g3 = 0) 
T+1: if condition then 
PC < PC + target 
endif 


Exceptions: 


None 
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BGEZAL 


Branch On Greater Than 
Or Equal To Zero And Link 


BGEZAL 


31 26 25 21 20 16 15 0 
REGIMM rs BGEZAL offset 
000001 10001 
6 5 5 16 
Format: 


BGEZAL ts, offset 


Description: 


A branch address is calculated from the sum of the address of the instruction in the 
delay slot and the 16-bit offset, shifted two bits left and sign-extended. 
Unconditionally, the address of the instruction next to the delay slot is stored in 
the link register, 73/7. If the contents of general purpose register rs are equal to or 
larger than 0, then the program branches to the branch address, with a delay of one 
instruction. 


Generally, general purpose register r3/ should not be specified as general purpose 
register rs, because the contents of rs are destroyed by storing link address, and 
then it may not be reexecutable. An attempt to execute this instruction does not 


cause exception, however. 


Operation: 


32 T: 


T+1: 


64 T: 


T+1: 


target < (offset,s)'* || offset || 07 
condition <— (GPR[rs]3; = 0) 
GPR[31] — PC +8 
if condition then 

PC < PC + target 
endif 
target < (offset,s)“° || offset || 07 
condition < (GPR[rs]g3 = 0) 
GPR[31] — PC +8 
if condition then 

PC < PC + target 
endif 


Exceptions: 


None 
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Branch On Greater Than 


B G EZA L Or Equal To Zero 


And Link Likely 


CPU Instruction Set Details 


BGEZALL 


31 26 25 21 20 16 15 0 
REGIMM rs BGEZALL offset 
000001 10011 
6 5 5 16 
Format: 


BGEZALL rs, offset 
Description: 


A branch address is calculated from the sum of the address of the instruction in the 
delay slot and the 16-bit offset, shifted two bits left and sign-extended. 
Unconditionally, the address of the instruction next to the delay slot is stored in 
the link register, r3/. If the contents of general purpose register rs are equal to or 
larger than 0, then the program branches to the branch address, with a delay of one 
instruction. When it does not branch, instruction in the delay slot are discarded. 
Generally, general purpose register r3/ should not be specified as general purpose 
register rs, because the contents of rs are destroyed by storing link address, and 
then it may not be reexecutable. An attempt to execute this instruction does not 
cause any exception, however. 


Operation: 
32 TT: target < (offset,s)'* || offset || 0° 
condition <- (GPR[rs]3; = 0) 
GPR[31] — PC +8 
T+1: if condition then 
PC < PC + target 
else 
NullifyCurrentinstruction 
endif 
64 TT: target < (offset,s)*° || offset || 07 
condition <— (GPRirs]g3 = 0) 
GPR[31] — PC +8 
T+1: if condition then 
PC < PC + target 
else 
NullifyCurrentinstruction 
endif 
Exceptions: 
None 
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B h On Great 
BG EZ L Than 6r Eatal To Zero Likely 


BGEZL 


31 26 25 21 20 16 15 0 
REGIMM rs BGEZL offset 
000001 00011 
6 5 5 16 


Format: 
BGEZL rs, offset 


Description: 


A branch address is calculated from the sum of the address of the instruction in the 
delay slot and the 16-bit offset, shifted two bits left and sign-extended. If the 
contents of general purpose register rs are equal to or larger than 0, then the 


program branches to the branch address, with a delay of one instruction. 


If it does not branch, the instruction in the branch delay slot is discarded. 


Operation: 


32 T: 


target < (offset,s)'4 || offset || 0° 
condition < (GPR[rs]3; = 0) 


T+1: if condition then 


64 T: 


PC < PC + target 
else 


NullifyCurrentinstruction 
endif 
target < (offset,s)° || offset || 0° 
condition < (GPR[rs]g3 = 0) 


T+1: if condition then 


PC < PC + target 


else 
NullifyCurrentinstruction 
endif 


Exceptions: 
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None 
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BGTZ Branch On Greater Than Zero BGTZ 


31 26 25 21 20 16 15 0 
BGTZ rs 0 offset 
000111 00000 
6 5 5 16 
Format: 


BGTZ rs, offset 


Description: 


A branch address is calculated from the sum of the address of the instruction in the 
delay slot and the 16-bit offset, shifted two bits left and sign-extended. The 
contents of general purpose register rs are larger than zero, then the program 
branches to the branch address, with a delay of one instruction. 


Operation: 


32 TT: target < (offset,s)'4 || offset || 0° 
condition <— (GPR[rs]31 = 0) and (GPR{[rs] « 0°) 
T+1: if condition then 


PC < PC + target 
endif 


64 T: target < (offset;;)*© || offset || 07 
condition <— (GPR[rs]g3 = 0) and (GPR{[rs] « 0°) 
T+1: if condition then 


PC < PC + target 
endif 


Exceptions: 


None 
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BGTZL 


Branch On Greater 
Than Zero Likely 


BGTZL 


31 26 25 21 20 16 15 0 
BGTZL rs 0 offset 
010111 00000 
6 5 5 16 
Format: 


BGTZL rs, offset 


Description: 


A branch address is calculated from the sum of the address of the instruction in the 
delay slot and the 16-bit offset, shifted two bits left and sign-extended. The 

contents of general purpose register rs are larger than 0, then the program branches 
to the branch address, with a delay of one instruction. 


If it does not branch, the instruction in the branch delay slot is discarded. 


Operation: 


32 T: 


T+1: 


target < (offset,s)'4 || offset || 0° 
condition <— (GPR[rs]3, = 0) and (GPR[rs] = 0°°) 


: if condition then 


PC < PC + target 
else 


NullifyCurrentlinstruction 
endif 


target < (offset,<)“© || offset || 07 
condition <— (GPR[rs]g3 = 0) and (GPR[rs] = 0°*) 
if condition then 


PC < PC + target 
else 

NullifyCurrentinstruction 
endif 


Exceptions: 


None 
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BLEZ Branch On Less Than BLEZ 
Or Equal To Zero 
31 26 25 21 20 16 15 0 
BLEZ rs 0 offset 
000110 00000 
6 5 5 16 
Format: 
BLEZ rs, offset 
Description: 
A branch address is calculated from the sum of the address of the instruction in the 
delay slot and the 16-bit offset, shifted two bits left and sign-extended. If the 
contents of general purpose register rs are equal to 0 or smaller than 0, then the 
program branches to the branch address, with a delay of one instruction. 
Operation: 
32 TT: target < (offset,s)'* || offset || 07 
condition <— (GPR[rs]3; = 1) or (GPR[rs] = 094) 
T+1: if condition then 
PC < PC + target 
endif 
64 T: target <— (offsets) *° || offset || 07 
condition <— (GPR[rs]g3 = 1) and (GPR[rs] = 064) 
T+1: if condition then 
PC < PC + target 
endif 
Exceptions: 
None 
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B hOnL Th 
BLEZL oo oréquaiToZeroLikey BLEZL 


31 26 25 21 20 16 15 0 
BLEZL rs 0 offset 
010110 00000 
6 5 5 16 


Format: 
BLEZL rs, offset 


Description: 


A branch address is calculated from the sum of the address of the instruction in the 
delay slot and the 16-bit offset, shifted two bits left and sign-extended. The 
contents of general purpose register rs is equal to or smaller than zero, then the 
program branches to the branch address, with a delay of one instruction. 


If it does not branch, the instruction in the branch delay slot is discarded. 


Operation: 


32 T: target < (offset,s)'* || offset || 0° 
condition <— (GPR[rs]31 = 1) or (GPR{[rs] = 0°) 
T+1: if condition then 
PC < PC + target 


NullifyCurrentinstruction 
endif 
64 TT: target < (offset,;)* || offset || 07 
condition <— (GPR[rs]g3 = 1) and (GPR[rs] = 0°) 
T+1: if condition then 
PC < PC + target 
else 
NullifyCurrentinstruction 
endif 


Exceptions: 


None 
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B LTZ Branch On Less Than Zero B LTZ 
31 26 25 21 20 16 15 0 
REGIMM BLTZ offset 
000001 00000 
6 5 5 16 
Format: 


Descrip 


BLTZ rs, offset 


tion: 

A branch address is calculated from the sum of the address of the instruction in the 
delay slot and the 16-bit offset, shifted two bits left and sign-extended. If the 
contents of general purpose register rs are smaller than 0, then the program 
branches to the branch address, with a delay of one instruction. 


Operation: 


32 TT: target < (offset,s)'* || offset || 0° 


T+1: if condition then 


64 T: target < (offset;;)*° || offset || 0° 


T+1: if condition then 


condition — (GPR[rs]31 = 1) 


PC < PC + target 
endif 


condition — (GPR[rs]g3 = 1) 


PC < PC + target 
endif 


Exceptions: 


None 
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BLTZAL 


Branch On Less 
Than Zero And Link 


BLTZAL 


31 26 25 21 20 16 15 0 
REGIMM rs BLTZAL offset 
000001 10000 
6 5 5 16 


Format: 
BLTZAL ts, offset 


Description: 


A branch address is calculated from the sum of the address of the instruction in the 
delay slot and the 16-bit offset, shifted two bits left and sign-extended. 
Unconditionally, the address of the instruction next to the delay slot is stored in 


the link register, r3/. If the contents of general purpose register rs are smaller than 
0, then the program branches to the branch address, with a delay of one 
instruction. 


Generally, general purpose register r3/ should not be specified as general purpose 
register rs, because the contents of rs are destroyed by storing link address, and 
then it is not reexecutable. An attempt to execute this instruction does not 


generate exceptions, however. 


Operation: 


32 T: 


T+1: 


target < (offset,<)'4 || offset || 07 
condition < (GPR[rs]3; = 1) 
GPR[31] — PC +8 


: if condition then 


PC < PC + target 
endif 
target < (offset,s)“° || offset || 07 
condition <— (GPRirs]g3 = 1) 
GPR[31] — PC +8 
if condition then 


PC < PC + target 
endif 


396 


Exceptions: 


None 
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BLTZALL than zero and Link Likely BLTZALL 


31 26 25 21 20 16 15 0 
REGIMM rs BLTZALL offset 
000001 10010 
6 5 5 16 
Format: 
BLTZALL rs, offset 
Description: 


A branch address is calculated from the sum of the address of the instruction in the 
delay slot and the 16-bit offset, shifted two bits left and sign-extended. 
Unconditionally, the instruction next to the delay slot is stored in the link register, 
r31. If the contents of general purpose register rs is smaller than 0, then the 
program branches to the branch address, with a delay of one instruction. 

If it does not branch, the instruction in the branch delay slot is discarded. 
Generally, general purpose register 73/ should not be specified as general purpose 
register rs, because the contents of rs are destroyed by storing link address, and 
then it is not reexecutable. An attempt to execute this instruction does not cause 
exception, however. 


Operation: 


32 TT: target < (offset,s)'* || offset || 0° 
condition < (GPR[rs]3; = 1) 
GPR[31] — PC + 8 
T+1: if condition then 


PC <= PC + target 


eRe NullifyCurrentinstruction 


endif 
64 T: target < (offset;;)*© || offset || 07 
condition <— (GPR[rs]g3 = 1) 
GPR[31] — PC +8 
T+1: if condition then 


PC <= PC + target 


ene NullifyCurrentinstruction 


endif 


Exceptions: 


None 
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BLTZL 


Branch On Less Than Zero Likely 


BLTZL 


31 26 25 21 20 16 15 0 
REGIMM rs BLTZL offset 
000001 00010 
6 5 5 16 
Format: 


BLTZL rs, offset 


Description: 


A branch address is calculated from the sum of the address of the instruction in the 
delay slot and the 16-bit offset, shifted two bits left and sign-extended. 
Unconditionally, the instruction next to the delay slot is stored in the link register, 
r31. If the contents of general purpose register rs are smaller than 0, then the 
program branches to the branch address, with a delay of one instruction. 


If it does not branch, the instruction in the branch delay slot is discarded. 


Operation: 


32 T: 


T+1: 


64 T: 


T+1: 


target < (offset;s)'* || offset || 0° 
condition — (GPR[rs]3; = 1) 
if condition then 

PC < PC + target 


NullifyCurrentinstruction 
endif 
target < (offset,s)“° || offset || 0° 
condition <— (GPRirs]g3 = 1) 
if condition then 

PC < PC + target 
else 

NullifyCurrentlinstruction 
endif 


Exceptions: 


None 
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B N E Branch On Not Equal B N E 
31 26 25 21 20 16 15 0 
BNE rs rt offset 
000101 
6 5 5 16 
Format: 
BNE rs, rt, offset 
Description: 
A branch address is calculated from the sum of the address of the instruction in the 
delay slot and the 16-bit offset, shifted two bits left and sign-extended. The 
contents of general purpose register rs and the contents of general purpose register 
rt are compared. If the two registers are not equal, then the program branches to 
the branch address, with a delay of one instruction. 
Operation: 
32 T: target < (offset,s)'* || offset || 0° 
condition <— (GPR[rs]  GPR[rt]) 
T+1: if condition then 
PC < PC + target 
endif 
64 T: target — (offset,;)*° || offset || 07 
condition < (GPR[rs] « GPRirt]) 
T+1: if condition then 
PC < PC + target 
endif 
Exceptions: 
None 
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BNEL 


Branch On Not Equal Likely 


BNEL rs, rt, offset 


Description: 


31 26 25 21 20 16 15 0 
BNEL rs rt offset 
010101 
6 5 5 16 
Format: 


A branch address is calculated from the sum of the address of the instruction in the 
delay slot and the 16-bit offset, shifted two bits left and sign-extended. The 

contents of general purpose register rs and the contents of general purpose register 
rt are compared. If the two registers are not equal, then the program branches to 


the branch address, with a delay of one instruction. 


If it does not branch, the instruction in the branch delay slot is discarded. 


Operation: 


400 


32 T: target < (offset,s)'* || offset || 0° 
condition < (GPR[rs]  GPR[rt]) 
T+1: if condition then 
PC < PC + target 


NullifyCurrentiInstruction 
endif 
64 T: target < (offset;;)*° || offset || 07 
condition < (GPR[rs]  GPR[rt]) 
T+1: if condition then 
PC < PC + target 


else 
NullifyCurrentlinstruction 


endif 


Exceptions: 


None 
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BREAK Breakpoint BREAK 


31 26 25 65 0 
SPECIAL code BREAK 
000000 001101 

6 20 6 
Format: 
BREAK 
Description: 


A breakpoint exception occurs after execution of this instruction, transferring 
control to the exception handler. 


The code area is available for use to transfer parameters to the exception handler, 
the parameter is retrieved by the exception handler only by loading the contents 
of the memory word containing the instruction as data. 


Operation: 


32, 64 T: — BreakpointException 


Exceptions: 


Breakpoint exception 
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CACHE Cache Operation CACHE 


31 26 25 21 20 16 15 0 
CACHE base op offset 
101111 
6 5 5 16 
Format: 


CACHE op, offset(base) 


Description: 


402 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. The virtual address is translated to a 
physical address using the TLB, and the 5-bit sub-opcode op specifies a cache 
operation contents for the specified address. 


CPO is not usable if the CPO enable bit CUp in the Status register in the User or 
Supervisor mode is cleared, and a coprocessor unusable exception occurs after 
execution of this instruction. The execution of this instruction on any cache/ 
operation combination not listed below, or on a secondary cache which is not 
supplied to Vp4300, is undefined. The execution of this instruction in uncached 
area is also undefined. 


The Index operation uses a part of the virtual address to specify a cache block. For 
example a cache of go CUES bytes with QLINEDIDS bytes per tag, 
vAddrcacHEBITS ... LINEBITS specifies the block. 


The Hit operation accesses the cache as normal data references, and performs the 
specified cache operation only if the cache contains valid data of the specified 
physical address (a hit). If data is not in the cache (a miss), the cache operation is 
not executed. 
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CACHE *“ (continued) CACHE 


Write back from a cache goes to the main memory. The address in the main 
memory to be written is the address in the cache tag and not the physical address 
translated by using TLB. 


The TLB miss exception and TLB invalid exception may occur when any cache 
operation is performed. The Index* operation executed to the address in the 
unmapped area is used to prevent occurrence of the TLB exception. The Index 
operation never generates the TLB change exception. Bits 16 and 17 of the 
instruction code indicate the cache subject to the operation as follows. 


Code Symbol Cache 
0 I instruction cache 
1 D data cache 
2 - reserved 
3 - reserved 


* Although a physical address is used to index the cache, it does not have to 


coincide with the cache tag. 


Bits 20:18 of this instruction specify the contents of the cache operation. For 
details, refer to the following pages. 
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CACHE Corned CACHE 


Opy..2 |Caches Cache Operation Operation 


0 Index_Invalidate Set the cache state of the cache block to Invalid. 


0 Index_Write_Back Examine the cache state of the data cache block at 
_Invalidate the Invalidate index specified by the virtual address. 
If the state is not Invalid, then write back the block 
to main memory. 

The address to write is taken from the cache tag. 
Set cache state of cache block to Invalid. 


1 Index_Load_Tag Read the tag for the cache block at the specified 
index and place it into the Tago register of the 
CPO. 


2 Index_Store_Tag Write the contents of the Lo register of the CPO 
register to the tag for the cache block at the 
specified index. 


3 Create_Dirty_Exclusive This operation is used to load as little data as 
possible from main memory when writing new data 
into the entire cache block where the coherency is 
kept. If the cache block does not contain the 


specified address, and the block is dirty, write it 
back to main memory. In all cases, set the cache 
block tag to the specified physical address, set the 
cache state to dirty. 


4 Hit_Invalidate If the cache block contains the specified address, 
set the cache block state invalid. 


5 Hit_Write_Back_Invalidate | If the cache block contains the specified address, 
write back the data if it is dirty, and set the cache 
block state invalid. 


5 Fill the instruction cache block with the data from 
main memory. 


If the cache block contains the specified address 
6 Hit_Write_Back and the cache state is in the dirty state, write back 
the data to main memory. 


If the cache block contains the specified address, 


° mL y nee Baer write back the data unconditionally. 
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CACHE = (“"GiSntihued) CACHE 


Operation: 


32,64 7:  vAddr < ((offset,s)*® || offset,s 9) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
CacheOp (op, vAddr, pAddr) 


Exceptions: 


Coprocessor unusable exception 
TLB invalid exception 

TLB miss exception 

Bus error exception 

Address error exception 
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M C IF 
CFCz Goprccecsor s os CFCz 


31 26 25 21 20 16 15 11 10 0 
COPz CF rt rd 0 
0100xx* 00010 0000000 0000 
6 5 5 5 11 
Format: 
CFCz rt, rd 
Description: 


The contents of coprocessor control register rd of CPz are loaded to general 
purpose register rt. 


This instruction is not valid for CPO. 


Operation: 

32 T: data <— CCR[z, rd] 
T+1: GPRirt] < data 

64 T: data — (CCRIz, rd]3)°* || CCR[z, rd] 
T+1: GPRirt] < data 


Exceptions: 


Coprocessor unusable exception 


Opcode Bit Encoding: 


CFCz Bit #31 30 29 28 27 26 25 24 23 22 21 0 
CFC1;/0)1/0)/0)/0)1/;0)'0/;0;1)0 


Bit #31 30 29 28 27 26 25 24 23 22 21 0 
cFc2/0}1/0]0]/1;/o/o]o0}0}1/0 


Opcode Coprocessor Sub-opcode 
Coprocessor Number 


* Refer to 16.7 CPU Instruction Opcode Bit Encoding. 
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COPz 


CPU Instruction Set Details 


COPz 


31 26 25 24 0 
COPz CO cofun 
0100xx*| 1 
6 1 25 
Format: 
COPz cofun 
Description: 


A coprocessor operation is performed. The operation may specify and reference 


internal coprocessor registers, and may change the state of the coprocessor 


condition line, but does not modify state within the processor or the cache/main 


memory. For details of coprocessor operations, refer to Chapter 17 FPU 


Instruction Set Details. 
Operation: 


32,64 T: CoprocessorOperation (z, cofun) 


Exceptions: 


Coprocessor unusable exception 
Floating-point exception (CP1 only) 


Opcode Bit Encoding: 


COPz Bit #31 30 29 28 27 26 25 
COP0|}0/;1;0,/0;0]0)}1 


Bit #31 30 29 28 27 26 25 


CoP1;0|1/0/0/;0|1/)} 1 


Bit #31 30 29 28 27 26 25 


Cop2}0}1/0/0/;1|0/}1 


Opcode 


— Coprocessor Number 


|_ Coprocessor Sub-opcode 


* Refer to 16.7 CPU Instruction Opcode Bit Encoding. 


User's Manual U10504EJ7VOUM00 


407 


Chapter 16 


CTCz Move Control To Coprocessor z CTCz 


31 26 25 21 20 16 15 11 10 0 
COPz CT rt rd 0) 
0100xx*} 00110 0000000 0000 
6 5 5 5 11 
Format: 
CTCz rt, rd 
Description: 


The contents of general purpose register vt are loaded into coprocessor control 
register rd of CPz. This instruction is not valid for CPO. 
Operation: 


32,64 T: data — GPR[rt] 
T+1: CCR[z, rd] < data 


Exceptions: 


Coprocessor unusable exception 


Opcode Bit Encoding: 
CTCz Bit#31 30 29 28 27 26 25 24 23 22 21 0 


cTci1;/0);1;0/;0;/;0}1;0/0)/1;1/0 


Bit #31 30 29 28 27 26 25 24 23 22 21 0 
cTc2}0;1)/0);/0)/1)/0;0)'0;1;1)0 


Opcode |__ Coprocessor Sub-opcode 
Coprocessor Number 


* Refer to 16.7 CPU Instruction Opcode Bit Encoding. 


408 User’s Manual U10504EJ7VOUMOO 


CPU Instruction Set Details 


DADD Doubleword Add DADD 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL ve rt rd 0 DADD 
000000 00000 101100 

6 5 5 5 5 6 
Format: 


DADD rd, rs, rt 


Description: 


The contents of general purpose register rs and the contents of general purpose 
register rt are added, and the result is stored in general purpose register rd. An 
integer overflow exception occurs if the carries out of bits 62 and 63 differ (2’s 
complement overflow). The contents of the destination register rd are not 
modified when an integer overflow exception occurs. 


This operation is only defined for the Vp4300 operating in 64-bit mode and in 32- 
bit Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 


Operation: 


64 T:  GPR[rd] —GPR[rs] + GPR[rt] 


Remark Same operation in the 32-bit Kernel mode. 


Exceptions: 


Integer overflow exception 
Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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DADD]| _Doubleword Add Immediate DADDI 


31 26 25 21 20 16 15 0 
DADDI rs rt immediate 
011000 
6 5 5 16 
Format: 


DADDI rt, rs, immediate 


Description: 


The 16-bit immediate is sign-extended and added to the contents of general 
purpose register rs, and the result is stored in general purpose register rt. An 
integer overflow exception occurs if carries out of bits 62 and 63 differ (2’s 
complement overflow). The contents of the destination register rt are not 
modified when an integer overflow exception occurs. 


This operation is only defined for the Vp4300 operating in 64-bit mode and in 32- 
bit Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 


Operation: 


64 T: GPR [rt] — GPR[rs] + (immediate,s)* || immediate;5 


Remark Same operation in the 32-bit Kernel mode. 


Exceptions: 


Integer overflow exception 
Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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CPU Instruction Set Details 


DADDIU immediate Unsigned DADDIU 


31 26 25 21 20 16 15 0 
DADDIU rs rt immediate 
011001 
6 5 5 16 
Format: 


DADDIU tt, rs, immediate 


Description: 


The 16-bit immediate is sign-extended and added to the contents of general 
purpose register rs, and the result is stored in general purpose register rt. 


This operation is only defined for the Vp4300 operating in 64-bit mode and in 32- 
bit Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 


The only difference between this instruction and the DADDI instruction is that 
DADDIU instruction never causes an integer overflow exception. 


Operation: 


64 T: GPR [rt] — GPR[rs] + (immediate;s)* || immediate;s5 


Remark Same operation in the 32-bit Kernel mode. 


Exceptions: 


Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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DADDU _Doubleword Add Unsigned DADDU 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 DADDU 
000000 00000 101101 
6 5 5 5 5 6 

Format: 


DADDU rd, rs, rt 


Description: 


The contents of general purpose register rs and the contents of general purpose 
register rt are added, and the result is stored in general purpose register rd. 


This operation is only defined for the Vp4300 operating in 64-bit mode and in 32- 
bit Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 


The only difference between this instruction and the DADD instruction is that 
DADDU instruction never causes an integer overflow exception. 


Operation: 


64 T:  GPR[rd] —GPR[rs] + GPR[rt] 


Remark Same operation in the 32-bit Kernel mode. 


Exceptions: 


Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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DDIV 


CPU Instruction Set Details 


Doubleword Divide DDIV 


31 26 25 21 20 16 15 6 5 0 
SPECIAL rs rt 0 DDIV 
000000 00 0000 0000 011110 

6 5 5 10 6 
Format: 
DDIV rs, rt 
Description: 
The contents of general purpose register rs are divided by the contents of general 
purpose register rt, treating both operands as signed integers. An integer overflow 
exception never occurs, and the result of this operation is undefined when the 
divisor is zero. 
This instruction is usually executed after additional instructions to check for a zero 
divisor and for overflow. 
When the operation completes, the quotient word of the double result is loaded 
into special register LO, and the remainder word of the double result is loaded into 
special register HT. 
If either of the two preceding instructions is MFHI or MFLO, the results of those 
instructions are undefined. To obtain the correct result, insert two or more 
additional instructions between the MFHI or MFLO and DDIV instruction. 
This operation is only defined for the Vp4300 operating in 64-bit mode and in 32- 
bit Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 
Operation: 
64 T-2: LO =< undefined 
HI < undefined 
T-1: LO =< undefined 
HI < undefined 
7: LO < GPR[rs] div GPR[rt] 
HI < GPRirs] mod GPRi[rt] 
Remark Same operation in the 32-bit Kernel mode. 
Exceptions: 


Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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DDIVU Doubleword Divide Unsigned DDIVU 


31 26 25 21 20 16 15 6 5 0 
SPECIAL rs rt 0 DDIVU 
000000 0000000000 011111 

6 5 5 10 6 
Format: 
DDIVU rs, rt 
Description: 


The contents of general purpose register rs are divided by the contents of general 
purpose register rt, treating both operands as unsigned integers. An integer 
overflow exception never occurs, and the result of this operation is undefined 
when the divisor is zero. 


This instruction is executed after the instructions to check for a zero division. 


When the operation completes, the quotient (doubleword) is stored into special 
register LO, and the remainder (doubleword) is stored into special register HI. 


If either of the two preceding instructions is MFHI or MFLO, the results of those 
instructions are undefined. To obtain the correct result, insert two or more 
instructions in between the MFHI or MFLO and DDIVU instructions. 


This operation is only defined for the Vp4300 operating in 64-bit mode and in 32- 
bit Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 


Operation: 
64 T-2: LO < undefined 
HI =< undefined 
T-1: LO =< undefined 
HI =< undefined 
T: LO <= (0 || GPR[rs]) div (0 || GPR[rt]) 
HI <= (0 || GPR[rs]) mod (0 || GPR[rt]) 


Remark Same operation in the 32-bit Kernel mode. 


Exceptions: 
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Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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DIV 


CPU Instruction Set Details 


Divide DIV 


31 26 25 21 20 16 15 6 5 0 
SPECIAL fe rt 0 DIV 
000000 00 0000 0000 011010 
6 5 5 10 6 
Format: 
DIV rs, rt 
Description: 


The contents of general purpose register rs are divided by the contents of general 
purpose register rt, treating both operands as unsigned integers. An overflow 
exception never occurs, and the result of this operation is undefined when the 
divisor is zero. In 64-bit mode, the result must be sign-extended, 32-bit values. 


This instruction is usually executed after the instructions to check for a zero 
division and for overflow. 


When the operation completes, the quotient (doubleword) is stored into special 
register LO, and the remainder (doubleword) is stored into special register HI. 


If either of the two preceding instructions is MFHI or MFLO, the results of those 
instructions are undefined. To obtain the correct result, insert two or more 
additional instructions in between the MFHI or MFLO and DIV instructions. 
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Divid 
DIV Rese DIV 


Operation: 
32 T-2: LO < undefined 
HI < undefined 
T-1: LO < undefined 
HI < undefined 
T: LO < GPRirs] div GPR[rt] 
HI < GPRirs] mod GPRi[rt] 
64 T-2: LO < undefined 
HI <— undefined 
T-1: LO < undefined 
HI < undefined 
T: q <— GPRiIrs]31_9 div GPR[rt]31. 9 
r <— GPRIrs]31..9 mod GPRIrt]31_ 0 
LO = (931)** Il 431.0 
HI = (r31)°* [I 131.0 
Exceptions: 
None 
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DIVU 


CPU Instruction Set Details 


Divide Unsigned DIVU 


31 26 25 21 20 16 15 6 5 0 
SPECIAL rs rt 0 DIVU 
000000 000000 0000 011011 
6 5 5 10 6 
Format: 
DIVUrs, rt 
Description: 


The contents of general purpose register rs are divided by the contents of general 
purpose register rt, treating both operands as unsigned integers. An integer 
overflow exception never occurs, and the result of this operation is undefined 
when the divisor is zero. In 64-bit mode, the result must be sign-extended, 32-bit 
values. 


This instruction is executed after the instructions to check for a zero division. 


When the operation completes, the quotient (doubleword) is stored into special 
register LO, and the remainder (doubleword) is stored into special register HI. 


If either of the two preceding instructions is MFHI or MFLO, the results of those 
instructions are undefined. To obtain the correct result, insert two or more 
additional instructions in between the MFHI or MFLO and DIVU instructions. 
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DIVU = eunsigned SS DIVU 


Operation: 
32 T-2: LO =< undefined 
HI =< undefined 
T-1: LO =< undefined 
HI =< undefined 
Ts LO <= (0 || GPR[rs]) div (0 || GPR[rt]) 
HI = (0 || GPR[rs]) mod (0 || GPR[rt]) 
64 T-2: LO =< undefined 
HI < undefined 
T-1: LO =< undefined 
HI < undefined 
T: q = (0 || GPR[rs}g1.__0) div (0 || GPRIrt}s1__0) 
r =— (0 || GPR[rs]31__.9) mod (0 || GPR[rt]3;.. 0) 
LO = (431)? {I Q31...0 
HI = (r31)°* II t31...0 
Exceptions: 
None 
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CPU Instruction Set Details 


Doubl dM F 
DM FCO System Control Coprocessol DMFCO 


31 26 25 21 20 16 15 1110 0 
COPO DMF rt rd 0 
010000 | 00001 0000000 0000 
6 5 5 5 11 
Format: 
DMECO rt, rd 
Description: 


The contents of coprocessor register rd of the CPO are loaded into general purpose 
register rt. 


This operation is defined for the Vp4300 operating in 64-bit mode and in 32-bit 
Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. The contents of the source coprocessor 
register rd are written to the 64-bit destination general purpose register rt. The 
operation of DMFCO instruction on a 32-bit register of the CPO is undefined. 


Operation: 


64 T: data <-CPR[0,rd] 
T+1: GPR[rt] < data 


Remark Same operation in the 32-bit Kernel mode. 


Exceptions: 


Coprocessor unusable exception (Vp4300 in 64-/32-bit User mode and 
Supervisor mode if CPO is disabled) 


Reserved instruction exception (VR4300 in 32-bit User or Supervisor mode) 
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Doubl dM T 
DMTCO System Control Coprocessbl DMTCO 


31 26 25 21 20 16 15 11 10 0 
COPO DMT rt rd 0 
010000 | 00101 00000000000 
6 5 5 5 11 
Format: 
DMTCO rt, rd 
Description: 


The contents of general purpose register rt are loaded into coprocessor register rd 
of the CPO. 


This operation is defined for the Vp4300 operating in 64-bit mode or in 32-bit 
Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 


The contents of the source general purpose register rd are written to the 64-bit 
destination coprocessor register rt. The operation of DMTCO instruction on a 32- 
bit register of the CPO is undefined. 


Because the state of the virtual address translation system may be altered by this 
instruction, the operation of load instructions, store instructions, and TLB 
operations immediately prior to and after this instruction are undefined. 


Operation: 


64 T: data — GPRirt] 
T+1: CPRIO, rd] < data 


Remark Same operation in the 32-bit Kernel mode. 


Exceptions: 


Coprocessor unusable exception (Vp4300 in 64-/32-bit User and Supervisor 
mode if CPO is disabled) 


Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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CPU Instruction Set Details 


DMULT Doubleword Multiply DMULT 


31 26 25 21 20 16 15 6 5 0 
SPECIAL rs rt 0 DMULT 
000000 000000 0000 011100 

6 5 5 10 6 
Format: 
DMULT rs, rt 
Description: 


The contents of general purpose registers rs and rt are multiplied, treating both 
operands as signed integers. An integer overflow exception never occurs. 


When the operation completes, the low-order doubleword is stored into special 
register LO, and the high-order doubleword is stored into special register HI. 


If either of the two preceding instructions is MFHI or MFLO, the results of these 
instructions are undefined. To obtain the correct result, insert two or more other 
instructions in between the MFHI or MFLO and DMULT instructions. 


This operation is only defined for the Vp4300 operating in 64-bit mode and in 32- 
bit Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 


Operation: 
64 T-2: LO =< undefined 
HI =< undefined 
T-1: LO =< undefined 
HI < undefined 
T: t < GPRi[rs] * GPR{[rt] 
LO = tg3..0 
HI <— t127...64 


Remark Same operation in the 32-bit Kernel mode. 


Exceptions: 


Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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DMULTU °° hcisned ” ~=DMULTU 


31 26 25 21 20 16 15 6 5 0 
SPECIAL rs rt 0 DMULTU 
000000 000000 0000 011101 

6 5 5 10 6 
Format: 


DMULTU rs, rt 


Description: 


The contents of general purpose register rs and the contents of general purpose 
register rt are multiplied, treating both operands as unsigned integers. An 
overflow exception never occurs. 


When the operation completes, the low-order doubleword is stored into special 
register LO, and the high-order doubleword is stored into special register HI. 


If either of the two preceding instructions is MFHI or MFLO, the results of these 
instructions are undefined. To obtain the correct result, insert two or more other 
instructions in between the MFHI or MFLO and DMULTU instructions. 


This operation is defined for the Vp4300 operating in 64-bit mode and in 32-bit 
Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 


Operation: 


64 T-2: LO < undefined 
HI <— undefined 
T-1: LO < undefined 
HI = undefined 
le t — (0 || GPR[rs]) * (0 || GPR[rt]) 
LO < tg3._.9 
HI <ty97..64 


Remark Same operation in the 32-bit Kernel mode. 


Exceptions: 
Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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CPU Instruction Set Details 


DSLL _Doubleword Shift Left Logical DSLL 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL 0 rt rd sa DSLL 
000000 00000 111000 
6 5 5 5 5 6 

Format: 


DSLL rd, rt, sa 


Description: 


The contents of general purpose register rt are shifted left by sa bits, inserting 
zeros into the low-order bits. The result is stored in general purpose register rd. 


This operation is defined for the Vp4300 operating in 64-bit mode and in 32-bit 
Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 


Operation: 


64 T: s<0O||sa 
GPR[rd] = GPR[rt](63-s)...0 || 0s 


Remark Same operation in the 32-bit Kernel mode. 


Exceptions: 


Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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DSLLV  P'sitsivaiabie. §. DSLLV 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 DSLLV 
000000 00000 010100 
6 5 5 5 5 6 

Format: 


DSLLYV rd, rt, rs 


Description: 


The contents of general purpose register rt are shifted left by the number of bits 
specified by the low-order six bits contained in general purpose register rs, 
inserting zeros into the low-order bits. The result is stored in general purpose 
register rd. 


This operation is defined for the Vp4300 operating in 64-bit mode and in 32-bit 
Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 


Operation: 


64 T: s<GPRirs]s5 0 
GPR[rd]<— GPR[rt](63-s)...0 || 0s 


Remark Same operation in the 32-bit Kernel mode. 


Exceptions: 


Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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DSLL32 


CPU Instruction Set Details 


DSLL32 


Doubleword Shift Left 


Logical + 32 


Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL 0 rt rd sa DSLL32 
000000 00000 111100 
6 5 5 5 5 6 

Format: 
DSLL32 rd, rt, sa 

Description: 
The contents of general purpose register rt are shifted left by 32+sa bits, inserting 
zeros into the low-order bits. The result is stored in general purpose register rd. 
This operation is defined for the Vp4300 operating in 64-bit mode and in 32-bit 
Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 

Operation: 

64 T: s<1||sa 

GPR[rd]<— GPR[rt](63-s)...0 I] 0° 

Remark Same operation in the 32-bit Kernel mode. 

Exceptions: 
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Doubleword 
DS RA Shift Right Arithmetic DS RA 
31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL 0 rt rd sa DSRA 
000000 00000 111011 
6 5 5 5 5 6 
Format: 
DSRA rd, rt, sa 

Description: 


The contents of general purpose register rt are shifted right by sa bits, sign- 
extending the high-order bits. The result is stored in general purpose register rd. 


This operation is defined for the Vp4300 operating in 64-bit mode and in 32-bit 
Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 


Operation: 


64 


T: s<0||sa 
GPR[rd] <— (GPR[rt}g3)° || GPRI[rt] ¢3...5 


Remark Same operation in the 32-bit Kernel mode. 


Exceptions: 
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Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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CPU Instruction Set Details 


Doubl d Shift Right 
DS RAV “Arithmetic Variable DS RAV 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 DSRAV 
000000 00000 010111 
6 5 5 5 5 6 
Format: 


DSRAYV rd, rt, rs 


Description: 


The contents of general purpose register rt are shifted right by the number of bits 
specified by the low-order six bits of general purpose register rs, sign-extending 
the high-order bits. The result is stored in general purpose register rd. 


This operation is defined for the Vp4300 operating in 64-bit mode and in 32-bit 
Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 


Operation: 


64 T: s =< GPRIrs]5, 6 
GPRI[rd] <— (GPR[rt]gg)° || GPR[rtles._.s 


Remark Same operation in the 32-bit Kernel mode. 


Exceptions: 


Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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DSRA32 “Arithmetic: 32°". DSRA32 


31 26 25 21 20 16 15 11 10 6.5 0 
SPECIAL 0 rt rd sa DSRA32 
000000 00000 REE 
6 5 5 5 5 6 
Format: 


DSRA32 rd, rt, sa 


Description: 


The contents of general purpose register rt are shifted right by 32+sa bits, sign- 
extending the high-order bits. The result is stored in general purpose register rd. 


This operation is defined for the Vp4300 operating in 64-bit mode and in 32-bit 
Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 


Operation: 


64 T: s<1 ||sa 
GPR{[rd] — (GPR[rt]g3)® || GPR[rt] 63. s 


Remark Same operation in the 32-bit Kernel mode. 


Exceptions: 


Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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DSRL Doubleword DSRL 


Shift Right Logical 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL 0 rt rd sa DSRL 
000000 00000 111010 
6 5 5 5 5 6 

Format: 


DSRL rd, rt, sa 


Description: 


The contents of general purpose register rt are shifted right by sa bits, inserting 
zeros into the high-order bits. The result is stored in general purpose register rd. 


This operation is defined for the Vp4300 operating in 64-bit mode and in 32-bit 
Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 


Operation: 


64 T: s<0||sa 
GPR{rd] < 0§ || GPR[rtlg3_; 


Remark Same operation in the 32-bit Kernel mode. 


Exceptions: 


Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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DSRLV 


Doubleword Shift Right 
Logical Variable 


DSRLV 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 DSRLV 
000000 00000 010110 
6 5 5 5 5 6 

Format: 
DSRLYV rd, rt, rs 

Description: 
The contents of general purpose register rt are shifted right by the number of bits 
specified by the low-order six bits of general purpose register rs, inserting zeros 
into the high-order bits. The result is stored in general purpose register rd. 
This operation is defined for the Vp4300 operating in 64-bit mode and in 32-bit 
Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 

Operation: 

64 T: s<GPRirs]s5 0 

GPR[rd] <— 08 || GPR[rt]e3._ 5 

Remark Same operation in the 32-bit Kernel mode. 

Exceptions: 
Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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DSRL32 °°" Pegicats 32°" ~DSRL32 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL 0 rt rd sa DSRL32 
000000 00000 111110 
6 5 5 5 5 6 
Format: 


DSRL32 rd, rt, sa 


Description: 


The contents of general purpose register rt are shifted right by 32+sa bits, 
inserting zeros into the high-order bits. The result is stored in general purpose 
register rd. 


This operation is defined for the Vp4300 operating in 64-bit mode and in 32-bit 
Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 


Operation: 


64 T: s<1||sa 
GPR[rd] < 08 || GPR[rt]g3_; 


Remark Same operation in the 32-bit Kernel mode. 


Exceptions: 


Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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DS UB Doubleword Subtract D SU B 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 DSUB 
000000 00000 101110 
6 5 5 5 5 6 

Format: 
DSUB rd, rs, rt 
Description: 


The contents of general purpose register rt are subtracted from the contents of 
general purpose register rs, and the result is stored in general purpose register rd. 


An integer overflow exception takes place if the carries out of bits 62 and 63 differ 
(2’s complement overflow). The contents of destination register rd are not 
modified when an integer overflow exception occurs. 


This operation is defined for the Vp4300 operating in 64-bit mode and in 32-bit 
Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 


Operation: 


64 T:  GPR[rd] — GPR[rs] — GPR{[rt] 


Remark Same operation in the 32-bit Kernel mode. 


Exceptions: 


Integer overflow exception 
Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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CPU Instruction Set Details 


DSU B U Doubleword Subtract Unsigned DSU B U 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 DSUBU 
000000 00000 101111 

6 5 5 5 5 6 
Format: 


DSUBU rd, rs, rt 


Description: 


The contents of general purpose register rt are subtracted from the contents of 
general purpose register rs, and the result is stored in general purpose register rd. 


The only difference between this instruction and the DSUB instruction is that 
DSUBU instruction never causes an integer overflow exception. 


This operation is defined for the Vp4300 operating in 64-bit mode and in 32-bit 
Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 


Operation: 


64 T:  GPR{[rd] — GPRirs] — GPR[rt] 


Remark Same operation in the 32-bit Kernel mode. 


Exceptions: 


Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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E R ET Return From Exception E R ET 


31 26 2524 65 0 
COPO CO 0 ERET 
010000 1 0000000 0000 0000 0000 011000 
6 1 19 6 

Format: 
ERET 
Description: 


ERET is the Vp4300 instruction for returning from an interrupt, exception, or 
error exception. Unlike a branch or jump instruction, ERET does not execute the 
next instruction. 


ERET instruction must not itself be placed in a branch delay slot. 


If the ERL bit of the Status register is set (SR = 1), load the contents of the 
ErrorEPC register to the PC and clear the ERL bit to zero. Otherwise (SR» = 0), 
load the PC from the EPC, and clear the EXL bit of the Status register to zero 
(SR, = 0). 


An ERET instruction executed between a LL instruction and SC instruction also 
causes the SC instruction to fail, since ERET instruction clears the LL bit to zero. 


Operation: 


32, 64 T: if SRo =1 then 

PC < ErrorEPC 

SR = SR31__3 || 9 || SRy_.0 
else 

PC —EPC 

SR <= SR3}4__.2 || 0 || SRo 
endif 
LLbit — 0 


Exceptions: 


Coprocessor unusable exception 
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J Jump J 


31 26 25 0 


J target 
000010 


6 26 


Format: 


J target 


Description: 


The 26-bit target is shifted left two bits and combined with the high-order four bits 
of the address of the delay slot to calculate the target address. The program 
unconditionally jumps to this calculated address with a delay of one instruction. 


Operation: 


32 T: temp <— target 
T+1: PC — PCs; _ 98 || temp || 0° 


64 T: temp <— target 
T+1: PC <— PCg3_ 98 || temp || 0° 


Exceptions: 


None 
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JAL 


Jump And Link JAL 


31 26 25 0 
JAL target 
000011 
6 26 
Format: 
JAL target 
Description: 


The 26-bit target is shifted left two bits and combined with the high-order four bits 
of the address of the delay slot to calculate the address. The program 
unconditionally jumps to this calculated address with a delay of one instruction. 
The address of the instruction after the delay slot is placed in the link register, r3/. 


Operation: 


32 T: 


T+1: PC — PC 31_ 9 /|| temp || 07 


T+1: PC — PC 63. 9 || temp || 07 


temp < target 
GPR[81] — PC +8 


temp < target 
GPR[81] — PC +8 


Exceptions: 
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None 
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Jump And Link Register JALR 


31 


26 25 21 20 16 15 11 10 6 5 0 


SPECIAL 
000000 


6 


00000 00000 001001 
5 5 5 5 6 


Format 


JALR rs 
JALR rd, rs 


Description: 


The program unconditionally jumps to the address contained in general purpose 
register rs, with a delay of one instruction. The address of the instruction after the 
delay slot is stored in general purpose register rd. The default value of rd, if 
omitted in the assembly language instruction, is 31. 


Register numbers rs and rd should not be equal, because such an instruction does 
not have the same effect when re-executed. If they are equal, the contents of rs 
are destroyed by storing link address. However, if an attempt is made to execute 
this instruction, an exception will not occur, and the result of executing such an 
instruction is undefined. 


Since instructions must be word-aligned, a Jump and Link Register instruction 
must specify a target register (7s) which contains an address whose low-order two 
bits are zero. If these low-order two bits are not zero, an address exception will 
occur when the jump target instruction is fetched. 


Operation: 


32, 64 


lee temp < GPR [rs] 
GPR{rd] — PC +8 
T+1: PC < temp 


Exceptions: 


None 
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JR Jump Register JR 


31 26 25 21 20 65 0 
SPECIAL rs 0 JR 
000000 0000000 0000 0000 001000 
6 5 15 6 
Format: 
JR rs 
Description: 


The program unconditionally jumps to the address contained in general purpose 
register rs, with a delay of one instruction. 


Since instructions must be word-aligned, a Jump Register instruction must 
specify a target register (rs) which contains an address whose low-order two bits 
are zero. If these low-order two bits are not zero, an address exception will occur 
when the jump target instruction is fetched. 


Operation: 
32, 64 T: temp < GPRI[rs] 
T+1: PC < temp 
Exceptions: 
None 
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L RB Load Byte L B 
31 26 25 21 20 16 15 0 
LB base rt offset 
100000 
6 5 5 16 
Format: 


LB rt, offset(base) 


Description: 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. The contents of the byte at the memory 


location specified by the address are sign-extended and loaded into general 


purpose register rt. 


Operation: 


32 T: 


64 T: 


vAddr <- (offsets) '© || offsety5 0) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr < pAddrpgize — 1... 3|| (PAddre 9 xor ReverseEndian®) 
mem < LoadMemory (uncached, BYTE, pAddr, vAddr, DATA) 
byte — vAddrs_ 9 xor BigEndianCPU® 

GPRIrt] — (mem7,s%pyte)~~ || MEM7,8*byte...8*byte 

vAddr < ((offsetys)*° || offsetys 0) + GPR[base] 

(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
pAddr < pAddrpgjze — 1 ... 3 || (PAddro 9 xor ReverseEndian’) 
mem < LoadMemory (uncached, BYTE, pAddr, vAddr, DATA) 
byte — vAddrs_ xor BigEndianCPU® 


56 
GPR[rt] — (mem7,g8*pyte)”” || MEM7,8*byte...8*byte 


Exceptions: 


TLB miss exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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L 3} U Load Byte Unsigned L B U 


31 26 25 21 20 16 15 0 
LBU base rt offset 
100100 
6 5 5 16 
Format: 


LBU rt, offset(base) 


Description: 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. The contents of the byte at the memory 
location specified by the address are zero-extended and loaded into general 
purpose register rt. 


Operation: 


32 TT:  vAddr < ((offset;s)'® || offset,s 9) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
pAddr < pAddrpsize — 1 ... 3 || (DAddro. 9 xor ReverseEndian’) 
mem < LoadMemory (uncached, BYTE, pAddr, vAddr, DATA) 
byte — vAddrs 9 xor BigEndianCPU® 
GPR[ri] — 0° || mem7,9* byte...8* byte 

64 TT: — vAddr < ((offset,s)*? || offsetys ¢) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
pAddr < pAddrpgjze _ 1..3 || (PAddrs 9 xor ReverseEndian’®) 
mem =< LoadMemory (uncached, BYTE, pAddr, vAddr, DATA) 
byte — vAddrs 9 xor BigEndianCPU® 
GPR[rt] — 0°° || mem7,9* byte...8* byte 


Exceptions: 
TLB miss exception TLB invalid exception 
Bus error exception Address error exception 
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_D Load Doubleword _D 
31 26 25 21 20 16 15 0 
LD base rt offset 
110111 
6 5 5 16 
Format: 


LD rt, offset(base) 


Description: 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. The contents of the 64-bit doubleword at 
the memory location specified by the address are loaded into general purpose 
register rt. 


If any of the low-order three bits of the address are not zero, an address error 
exception occurs. 


This operation is defined for the Vp4300 operating in 64-bit mode and in 32-bit 
Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 


Operation: 


64 TT:  vAddr < ((offset,s)*® || offsetys 0) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
mem < LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 
GPR[rt] — mem 


Remark In the 32-bit Kernel mode, the high-order 32 bits are ignored during 
virtual address creation. 


Exceptions: 


TLB miss exception 

TLB invalid exception 

Bus error exception 

Address error exception 

Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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LDCz Load Doubleword To Coprocessor z 


LDCz 


31 


26 25 21 20 16 15 0 


LDCz 


1101xx* 


base rt offset 


6 


Format: 


LDCz rt, offset(base) 


Description: 


442 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. The processor loads a doubleword from 
the addressed memory location to CPz. The manner in which each coprocessor 
uses the data is defined by the individual coprocessor specifications. 


If any of the low-order three bits of the address are not zero, an address error 
exception takes place. 


This instruction is not valid for use with CPO. 


When the CP1 is specified, the FR bit of the Status register equals zero, and the 
least-significant bit in the rt field is not zero; the operation of the instruction is 
undefined. If FR bit equals one, an odd or even register is specified by the rt. 


* Refer to the table Opcode Bit Encoding on next page, or 
16.7 CPU Instruction Opcode Bit Encoding. 
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LDCz Load Doubleword To Coprocessor z LDCz 


(continued) 


Operation: 


32. TT:  vAddr < ((offset,s)'® || offset,; 0) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
mem < LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 


COPzLD (rt, mem) 


64 TT:  vAddr < ((offset,s)*® || offsetys 9) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
mem < LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 


COPZzLD (rt, mem) 


Exceptions: 


TLB miss exception 

TLB invalid exception 

Bus error exception 

Address error exception 
Coprocessor unusable exception 


Opcode Bit Encoding: 
LDCz Bit#31 30 29 28 27 26 0 
LDC1)}1)/1/)0/1/0/ 1 
Bit#31 30 29 28 27 26 0 
LDC2|}1)/1/0/1/1/0 
aT 


Opcode Coprocessor Number 
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LDL 


Load Doubleword Left LDL 


31 26 25 21 20 1615 0 
LDL base rt offset 
011010 
6 5 5 16 
Format: 


LDL rt, offset(base) 


Description: 


This instruction is used in combination with the LDR instruction to load the 
doubleword data in the memory that is not at the word boundary to general 
purpose register rt. The LDL instruction loads the high-order portion of the data 
to the register, while the LDR instruction loads the low-order portion. 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to generate a virtual address that can specify any byte. Of the 
doubleword data in the memory whose most-significant byte is specified by the 
generated address, only the data at the same word boundary as the target address 
is loaded and stored to the high-order portion of general purpose register rt. The 
remaining portion of the register is not affected. Depending on the address 
specified, the number of bytes to be loaded changes from | to 8. 


In other words, first the addressed byte is stored to the most-significant byte 
position of general purpose register rt. If there is data of the low-order byte that 
follows the same doubleword boundary, the operation to store this data to the next 
byte of general purpose register rt is repeated. The remaining low-order byte is 
not affected. 


memory 
(big-endian) seit 
address 8 8/9] 10/11) 12/13) 14/15 Baio 
addresso [0/1/2/3/4/5] 6/7] toading ALB S| P/E F| GA) $24 
LDL $24,3($0) 
after 
loading [2 14|5|6 | 7) F| G|H| $24 
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Load Doubleword Left 
LD L (continued) LD L 


The contents of general purpose register rt are internally bypassed within the 
processor so that no NOP instruction is needed between an immediately preceding 
load instruction which targets general purpose register rt and a subsequent LDL 
(or LDR) instruction. 


The address error exception does not occur even if the specified address is not at 
the doubleword boundary. 


This operation is defined for the Vp4300 operating in 64-bit mode and in 32-bit 
Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 


Operation: 


64. T:  vAddr < ((offset,s)** || offsety5 o) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
pAddr <= pAddrpgjze_1..3 || (ODAddrs 9 xor ReverseEndian®) 
if BigEndianMem = 0 then 
pAddr — pAddrpsize-1...3 || 0° 
endif 
byte — vAddro 9 xor BigEndianCPU® 


mem < LoadMemory (uncached, byte, pAddr, vAddr, DATA) 
GPR[rt] — mem7,g*byte...0 || GPR[t]55-8*byte...0 


Remark In the 32-bit Kernel mode, the high-order 32 bits are ignored during 
virtual address creation. 
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LDL 


Load Doubleword Left 
(continued) 


LDL 


The relationship between the address given to the LDL instruction and the result 


(bytes for registers) are shown below: 


LDL 
Register A B Cc D E F G H 
Memory | J K L M N O P 


BigEndianCPU = 0 BigEndianCPU = 1 
vAddro._o destination type pfisel destination type pitsel 
LEM |BEM LEM|BEM 
0 PBCDEFGH| 0 0/7 || J KLMNOP| 7 0 | 0 
1 OPCDEFGH 1 0/6 |J KLMNOP H| 6 0 1 
2 NOPDEF GH} 2 0/5 |KLMNOPGH| 5 0} 2 
3 MNOPEF GH} 383 0/4 |LMNOPFGH| 4 0 | 3 
4 L MNOPFGH 4 0/3 |MNOPEFGH| 3 0| 4 
5 KLMNOPGH} 5 0/2 |NOPDEFGH| 2 0| 5 
6 J KLMNOPH| 6 0 | 1 OPCDEFGH| 1 0 | 6 
7 | J KLMNOP| 7 0/0 j|;PBCDEFGH| 0 0| 7 
Remark Type: access type output to memory (Refer to Figure 3-2 Byte 
Access within a Doubleword.) 
Offset: pAddr, 9 output to memory 
LEM Little-endian memory (BigEndianMem = 0) 
BEM Big-endian memory (BigEndianMem = 1) 
Exceptions: 
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TLB miss exception 
TLB invalid exception 
Bus error exception 


Address error exception 
Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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Load Doubleword Right L D R 


26 25 21 20 16 15 0 


LDR base offset 
011 ~ 11 


16 


Format: 


LDR rt, offset(base) 


Description: 


This instruction is used in combination with the LDL instruction to load the word 
data in the memory that is not at the word boundary to general purpose register rt. 
The LDL instruction loads the high-order portion of the data to the register, while 
the LDR instruction loads the low-order portion. 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to generate a virtual address that can specify any byte. Of the word 
data in the memory whose least-significant byte is specified by the generated 
address, only the data at the same doubleword boundary as the target address is 
loaded and stored to the low-order portion of general purpose register rt. The 
remaining portion of the register is not affected. Depending on the address 
specified, the number of bytes to be loaded changes from | to 8. 


In other words, first the addressed byte is stored to the least-significant byte 
position of general purpose register rt. If there is data of the high-order byte that 
follows the same doubleword boundary, the operation to store this data to the next 
byte of general purpose register rt is repeated. The remaining high-order byte is 
not affected. 


address 8 
address 0 


memory 
(big-endian) register 


9 | 10/11/12 |13 | 14/15 before FaATByG| DJ E|E] GlHI| $024 


1/2|/3/4/5/6|7. loading 


LDR $24,4($0) 


after 
loading 


A|B/C;/0]1/2/3)|4 | $24 
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LDR Load Doubleword Right | DR 


(continued) 


The contents of general purpose register rt are bypassed within the processor so 
that no NOP instruction is needed between an immediately preceding load 
instruction which targets general purpose register rt and a subsequent LDR (or 
LDL) instruction. 


The address error exception does not occur even if the specified address is not 
located at the doubleword boundary. 


This operation is defined for the Vp4300 operating in 64-bit mode and in 32-bit 
Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 


Operation: 


64 T: vAddr < ((offset,s)** || offsety5 o) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
pAddr < pAddrpgjze-1..3 || (PAddro 9 xor ReverseEndian®) 
if BigEndianMem = 1 then 
pAddr < pAddrs,_3|| 0° 
endif 
byte — vAddrs_ xor BigEndianCPU? 
mem < LoadMemory (uncached, DOUBLEWORD - byte, pAddr, vAddr, DATA) 
GPRIrt] — GPRIrt]e3__ 64-8*byte || MeMs3._ 8*byte 


Remark In the 32-bit Kernel mode, the high-order 32 bits are ignored during 
virtual address creation. 
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Load Doubleword Right 


(continued) 


LDR 


The relationship between the address given to the LDR instruction and the result 


(bytes for registers) is shown below: 


LDR 
Register A B Cc D E F G H 
Memory | J K L M N O P 
BigEndianCPU = 0 BigEndianCPU = 1 
vAddro 9 destination type ptieel destination type oliset 
LEM |BEM LEM |BEM 
0 | J KLMNOP| 7 0/0 |ABCDEFGI 0 7 | 0 
1 Al JKLMNO| 6 1/0 |ABCDEFI J} 1 6 0 
2 ABI JKLMN| 5 2/0 |ABCDEIJ Kj 2 5 0 
3 ABCIJKLM 4 3/0 |ABCDIJKL) 3 4 | 0 
4 ABCDI J KL] 83 4'/0 |ABCIJKLM 4 3 0 
5 ABCDEI JK] 2 5|}0 |ABIJKLMN) 5 2 0 
6 ABCDEFI J i 6/0 ;Al JKLMNO;) 6 1 0 
7 ABCDEF GI 0 7 | 0 Il J KLMNOP| 7 0 0 
Remark Type: access type output to memory (Refer to Figure 3-2 Byte 
Access within a Doubleword.) 
Offset: pAddry ¢ output to memory 
LEM Little-endian memory (BigEndianMem = 0) 
BEM Big-endian memory (BigEndianMem = 1) 
Exceptions: 


TLB miss exception 
TLB invalid exception 
Bus error exception 


Address error exception 


Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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LH Load Halfword LH 


31 26 25 21 20 16 15 0 
LH base rt offset 
100001 
6 5 5 16 


Format: 


LH rt, offset(base) 


Description: 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. The contents of the halfword at the 
memory location specified by the address are sign-extended and loaded into 
general purpose register rt. 


If the least-significant bit of the address is not zero, an address error exception 
occurs. 


Operation: 


32 TT:  vAddr < ((offset;s)'® || offsetys 0) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
pAddr <— pAddrpgize — 1...3 || (PAddre. 9 xor (ReverseEndian? || 0)) 
mem < LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA) 
byte — vAddro_9 xor (BigEndianCPU? || 0) 
GPRIrt] — (mem15,s%nyte) '° Il MeM15,8+byte...8* byte 

64. TT:  vAddr < ((offset;s)*® || offsetys 9) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
pAddr <— pAddrpgize — 1...3 || (PAddro. 9 xor (ReverseEndian? || 0)) 
mem < LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA) 
byte — vAddrs_9 xor (BigEndianCPU? || 0) 
GPR[rt] — (mem45,8*byte)'° || MeM15,8*byte...8* byte 


Exceptions: 


TLB miss exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LHU Load Halfword Unsigned LHU 


31 26 25 21 20 16 15 0 
LHU base rt offset 
100101 
6 5 5 16 
Format: 


LHU rt, offset(base) 


Description: 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. The contents of the halfword at the 
memory location specified by the address are zero-extended and loaded into 
general purpose register rt. 


If the least-significant bit of the address is not zero, an address error exception 
occurs. 


Operation: 


32. TT:  vAddr < ((offset;s)'® || offsetys 0) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 


pAddr <— pAddrpgize _ 1...3 || (PAdro_ 9 xor (ReverseEndian? || 0)) 
mem < LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA) 


byte — vAddrs_ xor (BigEndianCPU? || 0) 
GPR[rt] —0'° || mem45,8°byte...8*byte 


64 TT:  vAddr < ((offset;s)*® || offsetys 9) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 


pAddr =< pAddrpgize _ 1...3 || (PAddro 9 xor (ReverseEndian* || 0)) 
mem < LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA) 


byte — vAddrs 9 xor (BigEndianCPU? || 0) 
GPR[rt] — 0% || memis5,8*byte...8*byte 


Exceptions: 
TLB miss exception TLB invalid exception 
Bus error exception Address error exception 
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LL 


Load Linked LL 


31 


26 25 21 20 16 15 0 


LL 
110000 


base rt offset 


6 


5 5 16 


452 


Format 


LL rt, offset(base) 


Description: 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. The contents of the word at the memory 
location specified by the address are loaded into general purpose register rt. In 64- 
bit mode, the loaded word is sign-extended. In addition, the specified physical 
address of the memory is stored to the LLAddr register, and sets | to LLbit. 
Afterward, the processor checks whether the address stored to the LLAddr register 
is not rewritten by the other processors or devices. 


Load Linked (LL) and Store Conditional (SC) instructions can be used to 
atomically update memory: 


L1: 
LL T1, (T0) 
ADD T2, 71,1 
SC T2, (T0) 
BEQ T2, 0, L1 


This atomically increments the word addressed by TO. Changing the ADD 
instruction to an OR instruction changes this to an atomic bit set. 


This instruction is available in User mode, and it is not necessary to enable CPO. 


This instruction is defined to maintain the software compatibility with the 
VR4400. 
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CPU Instruction Set Details 


Load Linked 
(continued) L L 


If the specified address is in the non-cache area, the operation of the LL instruction 
is undefined. A cache miss that occurs between the LL and SC instructions 
hinders execution of the SC instruction. Usually, therefore, do not use a load or 
store instruction between the LL and SC instructions. Otherwise, the operation of 
the SC instruction is not guaranteed. If an exception frequently occurs, the 
exception also hinders execution of the SC instruction. It is therefore necessary 
to disable the exception temporarily. 


If either of the low-order two bits of the address are not zero, an address error 
exception takes place. 


32 


64 


Operation: 
T:  vAddr < ((offsetys)'® || offsetys 9) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
pAddr < pAddrpgize-1._3 || (PAddro_ 9 xor (ReverseEndian || 0°)) 
mem < LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
byte — vAddrs_ xor (BigEndianCPU || 07) 
GPR[rt]_ — mem31,s8*byte...s*byte 
LLbit <1 
LLAddr < pAddr 
T:  vAddr < ((offset;s)*° || offsetys 9) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
pAddr < pAddrpgjz_e-1...3 || (PAddre 9 xor (ReverseEndian || 07)) 
mem < LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
byte — vAddrs_ xor (BigEndianCPU || 02) 
GPR[rt] — (mems1,8*byte)*~ || MeM31,8°byte...*byte 
LLbit <1 
LLAddr < pAddr 
Exceptions: 


TLB miss exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LLD 


Load Linked Doubleword LLD 


31 26 25 21 20 16 15 0 
LLD base rt offset 
110100 
6 5 5 16 
Format: 


LLD rt, offset(base) 


Description: 


454 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. The contents of the doubleword at the 
memory location specified by the address are loaded into general purpose register 
rt. In addition, the specified physical address of the memory is stored to the 
LLAd4dr register, and sets 1 to LLbit. Afterward, the processor checks whether the 
address stored to the LLAddr register is not rewritten by the other processors or 
devices. 


Load Linked Doubleword (LLD) instruction and Store Conditional Doubleword 
(SCD) instruction can be used to atomically update the memory: 


L1: 
LLD T1, (TO) 
DADD- T2,T1,1 
SCD T2, (TO) 
BEQ T2, 0, L1 
NOP 


This atomically increments the doubleword addressed by TO. Changing the 
DADD instruction to an OR instruction changes this to an atomic bit set. 


This instruction is defined to maintain the software compatibility with the 
VR4400. 
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Load Linked Doubleword 
(continued) LLD 


If the specified address is in the non-cache area, the operation of the LLD 
instruction is undefined. A cache miss that may occur between the LLD and SCD 
instructions hinders execution of the SCD instruction. Usually, therefore, do not 
use a load or store instruction between the LLD and SCD instructions. Otherwise, 
the operation of the SCD instruction will not be guaranteed. If an exception 
frequently occurs, the exception also hinders execution of the SCD instruction. It 
is therefore necessary to disable the exception temporarily. 


This operation is defined for the Vp4300 operating in 64-bit mode and in 32-bit 
Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 


Operation: 


32 T: 


64 T: 


vAddr < ((offset,s)'® || offsetys 9) + GPR[base] 


mem < LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 
GPR[rt] <mem 

LLbit <1 

LLAddr < pAddr 


vAddr <- ((offsets)4° || offsety5 0) + GPR[base] 


mem < LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 
GPR[rt] < mem 

LLbit <1 

LLAddr < pAddr 


(pAddr, uncached) < AddressTranslation (vAddr, DATA) 


(pAddr, uncached) < AddressTranslation (vAddr, DATA) 


Remark In the 32-bit Kernel mode, the high-order 32 bits are ignored during 
virtual address creation. 
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L L D Load econ hae ee rd L L D 


Exceptions: 


TLB miss exception 

TLB invalid exception 

Bus error exception 

Address error exception 

Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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LUI Load Upper Immediate LUI 


31 


26 25 21 20 16 15 0 


LUI 0 
001111 00000 


rt immediate 


6 5 5 16 


Format: 


LUI rt, immediate 


Description: 


The 16-bit immediate is shifted left 16 bits and combined to 16 bits of zeros. The 
result is placed into general purpose register rt. In 64-bit mode, the loaded word 
is sign-extended to 64 bits. 


Operation: 


32 T:  GPRI[rt] — immediate || 0'° 


64 T: GPR{rt] — (immediate,s)% || immediate || 0' 


Exceptions: 


None 
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LW 


Load Word LW 


31 26 25 21 20 16 15 0) 
LW base rt offset 
100011 
6 5 5 16 
Format: 


LW tt, offset(base) 


Description: 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. The contents of the word at the memory 
location specified by the address are loaded into general purpose register rt. In 64- 
bit mode, the loaded word is sign-extended to 64 bits. 


If either of the low-order two bits of the address is not zero, an address error 
exception occurs. 


Operation: 


32 T: 


vAddr < ((offset;s)'® || offset, 0) + GPR[base] 

(pAddr, uncached) < AddressTranslation (vAddr, DATA) 

mem < LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
GPR[rt] <— mem 


64 TT:  vAddr < ((offset;s)*® || offsetys 0) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
mem < LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
GPR[rt] < mem 

Exceptions: 


458 


TLB miss exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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CPU Instruction Set Details 


LWCz Load Word ToCoprocessorz LWCz 


31 26 25 21 20 16 15 0 
LWCz base rt offset 
1100xx* 
6 5 5 16 
Format: 


LWCz rt, offset(base) 


Description: 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. The processor loads a word at the 
addressed memory location to the general purpose register rt of the CPz. The 
manner in which each coprocessor uses the data is defined by the individual 
coprocessor specifications. 


If either of the low-order two bits of the address is not zero, an address error 
exception occurs. 


This instruction is not valid for use with CPO. 


* Refer to the table Opcode Bit Encoding on next page, or 
16.7 CPU Instruction Opcode Bit Encoding. 
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LWCz Load eine z LWCz 


Operation: 


32 TT:  vAddr < ((offset,s)'® || offsety5 0) + GPR[base] 
(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr < pAddrpgjze-1...3 || (pAddr. 9 xor (ReverseEndian || 07)) 
mem < LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
byte < vAddro_ xor (BigEndianCPU || 0°) 
COPzLW (byte, rt, mem) 

64 TT:  vAddr < ((offset;s)*® || offset,s 0) + GPR[base] 
(pAddr, uncached)< AddressTranslation (vAddr, DATA) 
pAddr <— pAddrpgjze-1..3 || (PAddrs_ 9 xor (ReverseEndian || 07)) 
mem < LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
byte — vAddro_ xor (BigEndianCPU || 0°) 
COPzLW (byte, rt, mem) 


Exceptions: 


TLB miss exception 

TLB invalid exception 

Bus error exception 

Address error exception 
Coprocessor unusable exception 


Opcode Bit Encoding: 


LWCz Bit #31 30 29 28 27 26 0 
LWCc1;1/1/0)0{|0/) 1 


Bit#31 30 29 28 27 26 0 
Lwo2/1/1/0/0]1/0 


Opcode Coprocessor Number 
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LWL 


CPU Instruction Set Details 


Load Word Left LWL 


31 26 25 21 20 16 15 0 
LWL base rt offset 
100010 
6 5 5 16 
Format: 


LWL rt, offset(base) 


Description: 


This instruction is used in combination with the LWR instruction to load the word 
data in the memory that is not at the word boundary to general purpose register rt. 

The LWL instruction loads the high-order portion of the data to the register, while 
the LWR instruction loads the low-order portion. 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to generate a virtual address that can specify any byte. Of the word 
data in the memory whose most-significant byte is specified by the generated 
address, only the data at the same word boundary as the target address is loaded 
and stored to the high-order portion of general purpose register rt. The remaining 
portion of the register is not affected. Depending on the address specified, the 
number of bytes to be loaded changes from | to 4. 


In other words, first the addressed byte is stored to the most-significant byte 
position of general purpose register rt. If there is data of the low-order byte that 
follows the same word boundary, the operation to store this data to the next byte 
of general purpose register rt is repeated. 


The remaining low-order byte is not affected. 


address 4 
address 0 


memory 
(big-endian) register 
4 5 6 7 before 
0 1 2 3 loading e 2 tes 
LWL $24,1($0) 
after 


loading |__1 2| 3] D | $24 
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LWL 


Load Word Left 
(continued) 


The contents of general purpose register t are bypassed within the processor so 
that no NOP instruction is needed between an immediately preceding load 
instruction which targets general purpose register rt and a subsequent LWL (or 


LWR) instruction. 


The address exception error does not occur even if the specified address is not 


located at the word boundary. 


Operation: 


462 


32 


64 


T: 


vAddr < ((offset;s)'® || offsety5 0) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
pAddr < pAddrpgize_4_3|| (pAddr 9 xor ReverseEndian®) 
if BigEndianMem = 0 then 
pAddr — pAddrpgize-1...2 || 0° 
endif 
byte — vAddr,_9 xor BigEndianCPU? 
word < vAddro xor BigEndianCPU 
mem < LoadMemory (uncached, byte, pAddr, vAddr, DATA) 


temp — MeM32*word+8*byte+7 || GPRItt]23-8*byte...0 
GRP{[rt] — temp 


vAddr < ((offset;s)** || offsety5 0) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
pAddr < pAddrpgize-4_3|| (pAddr 9 xor ReverseEndian®) 
if BigEndianMem = 0 then 
pAddr — pAddrpsize-t...2l| 0° 
endif 
byte <— vAddr, 9 xor BigEndianCPU? 
word < vAddro xor BigEndianCPU 
mem < LoadMemory (uncached, byte, pAddr, vAddr, DATA) 


temp — MeM32+words8*byte+7 || GPR[tt]23-8*byte...0 
GPR{[rt] <— (temp31)°* || temp 
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Load Word Left 
LWL (continued) LWL 


The relationship, between the address given to the LWL instruction and the result 
(bytes for registers) is shown below: 


LWL 
Register A B Cc D E F G H 
Memory J K L MN Sa 
BigEndianCPU = 0 BigEndianCPU = 1 
vAddro_ 6 destination type pus destination type oltset 
LEM |BEM LEM | BEM 
0 SSSSPFGH 0 0;/7 |SSSSIJKL) 3 4| 0 
1 SSSSOPGH| 1 0/6 |SSSSJKLH) 2 4 1 
2 SSSSNOPH| 2 0/5 |SSSSKLGH| 1 4 | 2 
3 SSSSMNOP| 3 0/4 |SSSSLFGH) 0 4 | 3 
4 SSSSLFGH| 0 4/3 |SSSSMNOP} 3 0 | 4 
5 SSSSKLGH| 1 4/2 |SSSSNOPH) 2 0| 5 
6 SSSSJKLH| 2 4/1 SSSSOPGH| 1 0| 6 
hs SSSSIJKL] 3 4/0 |SSSSPFGH| 0 0 | 7 
Remark Type: access type output to memory (Refer to Figure 3-2 Byte 
Access within a Doubleword.) 
Offset: pAddry ¢ output to memory 
LEM §Little-endian memory (BigEndianMem = 0) 
BEM Big-endian memory (BigEndianMem = 1) 
S: sign-extension of destination bit 31 
Exceptions: 


TLB miss exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LWR 


Load Word Right LWR 


31 26 25 21 20 16 15 0 
LWR base rt offset 
100110 
6 5 5 16 
Format: 


LWR tt, offset(base) 


Description: 


This instruction is used in combination with the LWL instruction to load the word 
data in the memory that is not at the word boundary to general purpose register rt. 
The LWL instruction loads the high-order portion of the data to the register, while 
the LWR instruction loads the low-order portion. 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to generate a virtual address that can specify any byte. Of the word 
data in the memory whose least-significant byte is specified by the generated 
address, only the data at the same word boundary as the target address is loaded 
and stored to the low-order portion of general purpose register rt. The remaining 
portion of the register is not affected. Depending on the address specified, the 
number of bytes to be loaded changes from | to 4. 


In other words, first the addressed byte is stored to the least-significant byte 
position of general purpose register rt. If there is data of the high-order byte that 
follows the same word boundary, the operation to store this data to the next byte 
of general purpose register rt is repeated. 


The remaining high-order byte is not affected. 


memory 
(big-endian) register 
address 4 4 5 6 7 before 
addressO | 0 1 D7 <3 foadifig 220k st. [OL EE Se4 
LWR $24,4($0) 
after 
loading A B Cc 4 $24 
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LWR Loed Word ight LWR 


The contents of general purpose register t are bypassed within the processor so 
that no NOP instruction is needed between an immediately preceding load 
instruction which targets general purpose register rt and a following LDL (or 
LWR) instruction. 


The address error exception does not occur even if the specified address is not 
located at the word boundary. 


Operation: 


32. TT:  vAddr < ((offset;s)'®|| offsetys 0) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
pAddr < pAddrpgize_1_3 || (pAddr 9 xor ReverseEndian®) 
if BigEndianMem = 1 then 
pAddr — pAddrpgize-ai...3 || 0° 
endif 
byte — vAddr,_ xor BigEndianCPU? 
word < vAddro xor BigEndianCPU 
mem < LoadMemory (uncached, 0 || byte, pAddr, vAddr, DATA) 


temp <— mems}__39-8*byte...0 Il MEM31+432*word-32*word+8*byte 
GPRi[rt] <— temp 


64 TT:  vAddr < ((offset;s)** || offsetys 0) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
pAddr < pAddrpgize_1_3 || (pAddr 9 xor ReverseEndian®) 
if BigEndianMem = 1 then 
pAddr — pAddrpgize-ai...3 || 0° 
endif 
byte — vAddr, 9 xor BigEndianCPU? 
word < vAddro xor BigEndianCPU 
mem < LoadMemory (uncached, 0 || byte, pAddr, vAddr, DATA) 


temp <— mems}___32-8*byte...0 || MEM31+32*word-32*word+8*byte 
GPR[rt] <— (temp3,)°? || temp 
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LWR 


Load Word Right LWR 


(continued) 


The relationship between the address given to the LWR instruction and the result 
(bytes for registers) are shown below: 


LWR 
Register A B Cc D E F G H 
Memory | J K L M N O P 
BigEndianCPU = 0 BigEndianCPU = 1 
vAddro. 9 destination type oltset destination type offset 
LEM |BEM LEM | BEM 
0 SSSSMNOP} 3 0|;4 |XXXXEFGI 0 7 | 0 
1 XXXXEMNO| 2 1|4 |XXXXEFI J} 1 6 | 0 
2 XXXXEFMN| 1 2/4 |XX XXEIJ K} 2 5 | 0 
3 XXXXEFGM| 0 3/4 |SSSSIJKL) 3 4| 0 
4 SSSSIJKL] 3 4;0 |XXXXEFGM 0 3 | 4 
5 XXXXEI JK] 2 5/0 |XXXXEFMN| 1 2| 4 
6 XXXXEFI J} 1 6)}/0 |XX XXEMNO) 2 1 4 
7 XXXXEFGI 0 7/0 |SSSSMNOP) 3 0 | 4 
Remark Type: access type output to memory (Refer to Figure 3-2 Byte 
Access within a Doubleword.) 
Offset: pAddry ¢ output to memory 
LEM §Little-endian memory (BigEndianMem = 0) 
BEM Big-endian memory (BigEndianMem = 1) 
S: Sign-extension of destination bit 31 
xX! Not affected (in 32-bit mode) 
Sign-extension of destination bit 31 (in 64-bit mode) 
Exceptions: 
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TLB miss exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LWU Load Word Unsigned LWU 


31 26 25 21 20 16 15 0 
LWU base rt offset 
100111 
6 5 5 16 
Format: 


LWU tt, offset(base) 


Description: 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. The contents of the word at the memory 
location specified by the address are loaded into general purpose register vt. The 
loaded word is zero-extended in 64-bit mode. 


If either of the low-order two bits of the effective address is not zero, an address 
error exception occurs. 


This operation is defined for the Vp4300 operating in 64-bit mode and in 32-bit 
Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 


Operation: 


32 TT: vAddr< ((offset,s)'® || offset,s 0) + GPR[base] 
(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
mem < LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
GPR[rt] — mem 

64 T:  vAddr< ((offset;;)*® || offsetys 9) + GPR[base] 


(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
mem < LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
GPR[rt] — 022 || mem 


Remark In the32-bit Kernel mode, the high-order 32 bits are ignored during 
virtual address creation. 
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LWU‘egergupseres LU 


Exceptions: 


TLB miss exception 

TLB invalid exception 

Bus error exception 

Address error exception 

Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
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M F 
M FCO System Control Coprosessck M FCO 


31 26 25 21 20 16 15 1110 0 
COPO MF rt rd 0) 
010000 | 00000 0000000 0000 
6 5 5 5 11 
Format: 
MEFCO rt, rd 
Description: 


The contents of general purpose register rd of the CPO are loaded into general 
purpose register rt. 


Operation: 


32 T: data< CPR[O,rd] 
T+1: GPR[rt] — data 


64 TT: data < CPRIO,rd] 
T+1: GPR[rt] — (datag;)°* || datas, 9 


Exceptions: 


Coprocessor unusable exception (Vp4300 in 64-/32-bit User and Supervisor 
mode if CPO is disabled) 
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Mi FCz Move From Coprocessor z Mi FCz 


31 26 25 21 20 16 15 11 10 0 
COPz MF rt rd 0 
0100xx* 00000 000 0000 0000 
6 5 5 5 11 


Format: 
MFCz rt, rd 


Description: 


The contents of general purpose register rd of CPz are loaded into general purpose 
register rt. 


Operation: 

32 T: data <— CPRiz,rd] 
T+1: GPRirt] < data 

64 T: — ifrdg = 0 then 


data <— CPR{[z, rd4_4 || O]31..0 
else 
data <— CPRIZ, rd4__4 || Ole3...32 
endif 
T+1: GPRf[rt] — (datag;)°* || data 


Exceptions: 


Coprocessor unusable exception 


* Refer to the table Opcode Bit Encoding on next page, or 
16.7 CPU Instruction Opcode Bit Encoding. 
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CPU Instruction Set Details 


Move From Coprocessor z M FCz 


(continued) 


Opcode Bit Encoding: 


30 29 28 27 26 25 24 23 22 21 


MFCz Bit #31 


MFCO| 0 


1;/0/;0;0);0;0;0,/0;0/0 


Bit #31 


30 29 28 27 26 25 24 23 22 21 


MFC1 | 0 


1)/0/;0;/0)1;0;0,/0;0/0 


Bit #31 


30 29 28 27 26 25 24 23 22 21 


MFC2| 0 


1);0/0;1);0;0;0,/0;0/;0 


Opcode |__ Coprocessor Sub-opcode 
Coprocessor Number 
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MFHI 


Move From HI 


MFHI 


31 26 25 16 15 1110 5 0 
SPECIAL 0 rd 0 MFHI 
000000 |}00 0000 0000 00000 010000 

6 10 5 5 6 
Format: 
MFHI rd 
Description: 


The contents of special register H/ are loaded into general purpose register rd. 


To ensure proper operation in the event of interruptions, the two instructions 


which follow a MFHI instruction may not be any of the instructions which modify 
the H/ register: MULT, MULTU, DIV, DIVU, MTHI, DMULT, DMULTU, 


DDIV, DDIVU. 
Operation: 
32,64 TT: GPRird]<—HI 
Exceptions: 

None 
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MELO Move From LO MELO 


31 26 25 16 15 1110 6 5 0 
SPECIAL 0 rd 0 MFLO 
000000 |00 0000 0000 00000 010010 

6 10 5 5 6 
Format: 
MELO rd 
Description: 


The contents of special register LO are loaded into general purpose register rd. 


To ensure proper operation in the event of interruptions, the two instructions 
which follow a MFLO instruction may not be any of the instructions which 
modify the LO register: MULT, MULTU, DIV, DIVU, MTLO, DMULT, 
DMULTU, DDIV, DDIVU. 


Operation: 


32, 64 


T:  GPR[rd] — LO 


Exceptions: 


None 
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Move To 
MTCO System Control Coprocessor MTCO 


31 26 25 21 20 16 15 1110 0 
COPO MT rt rd 0 
010000 00100 000 0000 0000 
6 5 5 5 11 
Format: 
MTCO tt, rd 
Description: 


The contents of general purpose register rt are loaded into general purpose register 
rd of CPO. 


Because the contents of the TLB may be altered by this instruction, the operation 
of load instructions, store instructions, and TLB operations immediately prior to 
and after this instruction are undefined. 


If the register manipulated by this instruction is used by an instruction before or 
after this instruction, place that instruction at an appropriate position by referring 
to Chapter 19 Coprocessor 0 Hazards. 


Operation: 

32,64 T: data — GPR[ri] 
T+1: CPR[O, rd] < data 

Exceptions: 


Coprocessor unusable exception (Vp4300 in 64-/32-bit User and Supervisor 
mode if CPO is disabled) 
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MTCz Move To Coprocessor z MTCz 
0 


31 26 25 21 20 16 15 11 10 


COPz MT rt rd 0 
0100xx* 00100 000 0000 0000 


6 5 5 5 11 


Format: 
MTCz rt, rd 


Description: 


The contents of general purpose register rt are loaded into general purpose register 
rd of CPz. 


Operation: 


32 = =T: data < GPRirt] 
T+1: CPR[z, rd] < data 


64 TT: data< GPRIrtls1_0 


T+1: if rdg = 0 
CPRIZ, rq... || 0] <— CPRIZ, rdq.__4 || Oleg...32 || data 
else 
CPRIZ, rdq__4 || 0] < data || CPRIZ, rd4__4 || O31..0 
endif 
Exceptions: 


Coprocessor unusable exception 


* Refer to the table Opcode Bit Encoding on next page, or 
16.7 CPU Instruction Opcode Bit Encoding. 
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MTCz Move To Coprocessor z MTCz 


(continued) 


Opcode Bit Encoding: 


MTCz Bit #31 30 29 28 27 26 25 24 23 22 21 0 
MTCO;0)/1;0;0;0;0;0;0'>1;0/0 


Bit #31 30 29 28 27 26 25 24 23 22 21 0 
MTC1;/0}1)/0)/0)0)1/0)/0);1/;0)0 


Bit #31 30 29 28 27 26 25 24 23 22 21 0 
MTcC2|}0;1/0)/0)1)/0;/0)/0);1/0)0 


Opcode 


| Coprocessor Sub-opcode 
Coprocessor Number 
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MTHI 


MTHI Move To HI 


31 26 25 21 20 65 0 
SPECIAL rs 0 MTHI 
000000 000 000000000000) 010001 
6 5 15 6 
Format: 
MTHI rs 
Description: 


The contents of general purpose register rs are loaded into special register HT. 


If the MTHI instruction is executed following the MULT, MULTU, DIV, or 


DIVU instruction, the operation is performed normally. However, if the MFLO, 
MFHI, MTLO, or MTHI instruction is executed following the MTHI instruction, 


the contents of special register LO are undefined. 


Operation: 
32,64 T-2: HI < undefined 
T-1: HI < undefined 
T: HI — GPR{[rs] 
Exceptions: 
None 
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MTLO Move To LO MTLO 


31 26 25 2120 65 0 
SPECIAL rs 0 MTLO 
000000 000000000000000 010011 
6 5 15 6 
Format: 
MTLO ts 
Description: 


The contents of general purpose register rs are loaded into special register LO. 


If the MTLO instruction is executed following the MULT, MULTU, DIV, or 
DIVU instruction, the operation is performed normally. However, if the MFLO, 
MFHI, MTLO, or MTHI instruction is executed following the MTLO instruction, 
the contents of special register HJ are undefined. 


Operation: 
32,64 T-2: LO < undefined 


T-1: LO < undefined 
T: LO < GPRirs] 


Exceptions: 


None 
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MULT Multiply MULT 


31 26 25 21 20 16 15 6 5 0 
SPECIAL rs rt 0 MULT 
000000 000000 0000 011000 
6 5 5 10 6 

Format: 
MULT rs, rt 
Description: 


The contents of general purpose registers rs and rt are multiplied, treating both 
operands as 32-bit signed integers. An integer overflow exception never occurs. 


In 64-bit mode, the operands must be valid 32-bit, sign-extended values. 


When the operation completes, the low-order word of the double result is loaded 
into special register LO, and the high-order word of the double result is loaded into 
special register HI. In the 64-bit mode, the respective results are sign-extended 
and stored. 


If either the two instructions immediately preceding this instruction is the MFHI 
or MFLO instruction, the execution result of the transfer instruction is undefined. 


To obtain the correct result, insert two or more other instructions in between the 
MFHI or MFLO and MULT instruction. 
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MULT tea MULT 


(continued) 
Operation: 
32 T-2: LO <= undefined 
HI <= undefined 
T-1: LO <= undefined 
HI =< undefined 
T: ot < GPRi[rs] * GPR[rt] 
LO <= (31..0 
H| =— tg3...32 
64 T-2: LO =< undefined 
HI =< undefined 
T-1: LO <= undefined 
HI =< undefined 
te 2 <— GPR[rs]31_.0 “ GPR[rt]31_.0 
LO ba (toi) Il t34...0 
HI < (tgs) || tes._90 
Exceptions: 
None 
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MULTU Multiply Unsigned MULTU 


31 26 25 21 20 16 15 6 5 0 
SPECIAL fs rt 0 MULTU 
000000 000000 0000 011001 
6 5 5 10 6 
Format: 
MULTU rs, rt 
Description: 


The contents of general purpose register rs and the contents of general purpose 
register rt are multiplied, treating both operands as 32-bit unsigned values. An 
overflow exception never occurs. 


In 64-bit mode, the operands must be valid 32-bit, sign-extended values. 


When the operation completes, the low-order word of the doubleword result is 
loaded into special register LO, and the high-order word of the doubleword result 
is loaded into special register HI. In 64-bit mode, these results are sign-extended 
and loaded. 


If either of the two preceding instructions is MFHI or MFLO, the execution results 
of these transfer instructions are undefined. To obtain the correct result, insert two 
or more additional instructions in between the MFHI or MFLO and MULT 
instructions. 
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MULTUM inved) MULTU 


Operation: 
32 T-2: LO =< undefined 
HI =< undefined 
T-1: LO =< undefined 
HI =< undefined 
T: ot = (0 || GPR[rs]) * (0 || GPR[rt]) 
LO << t31..0 
HI < t63...32 
64 T-2: LO =< undefined 
HI =< undefined 
T-1: LO =< undefined 
HI =< undefined 
Te. ot == (0 || CP AISI.) * (0 || GPR[rt]31..0) 
LO = (tsi) Il t31...0 
HI =— (tg3)°* II tes...32 
Exceptions: 
None 
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NOR Nor NOR 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 NOR 
000000 00000 100111 
6 5 5 5 5 6 
Format: 
NOR rd, rs, rt 
Description: 


A logical NOR operation applied between the contents of general purpose 
registers rs and rt is executed in bit units. The result is stored in general purpose 
register rd. 


Operation: 


32, 64 T: | GPR[rd] < GPR[rs] nor GPR[rt] 


Exceptions: 


None 
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OR 2 OR 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 OR 
000000 00000 100101 
6 5 5 5 5 6 
Format: 
OR rd, rs, rt 
Description: 


A logical OR operation applied between the contents of general purpose registers 
rs and rt is executed in bit unites. The result is stored in general purpose register 
rd. 


Operation: 


32, 64 T:  GPR{[rd] — GPR[rs] or GPR[rt] 


Exceptions: 


None 
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ORI Or Immediate ORI 


31 26 25 21 20 16 15 0 
ORI rs rt immediate 
001101 
6 5 5 16 
Format: 


ORI rt, rs, immediate 


Description: 


A logical OR operation applied between 16-bit zero-extended immediate and the 
contents of general purpose register rs is executed in bit units. The result is stored 
in general purpose register rt. 


Operation: 


32 T: GPR[rt] <— GPR[rs]31..16 || (immediate or GPR[rs]15__ 0) 
64 T: GPR[rt] — GPR[rs]g3._ 46 || (immediate or GPR[rs];5_ 0) 


Exceptions: 


None 
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SB Store Byte SB 
31 26 25 21 20 16 15 0 
SB base rt offset 
101000 
6 5 5 16 
Format: 


SB rt, offset(base) 


Description: 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. The least-significant byte of register rt is 
stored in the memory specified by the address. 


Operation: 


32 T: 


64 T: 


vAddr < ((offset;s)'® || offsetys 0) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
pAddr < pAddrpgiz_-1...3 || (PAddre. 9 xor ReverseEndian’) 


byte — vAddrs_ 9 xor BigEndianCPU? 

data — GPRIrt]e3-s+byte... || 0° PY"° 

StoreMemory (uncached, BYTE, data, pAddr, vAddr, DATA) 
vAddr < ((offsetys)*® || offsetys 0) + GPR[base] 

(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
pAddr < pAddrpgize-1..3 || (PAddrs_9 xor ReverseEndian?) 
byte <—vAddry 9 xor BigEndianCPU® 

data — GPRI[rtles_s+byte...o || 0°" 

StoreMemory (uncached, BYTE, data, pAddr, vAddr, DATA) 


Exceptions: 


TLB miss exception 
TLB invalid exception 
TLB modification exception 


Bus 


error exception 


Address error exception 
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SC 


CPU Instruction Set Details 


Store Conditional SC 


31 26 25 21 20 16 15 0 
SC base rt offset 
111000 
6 5 5 16 
Format: 


SC rt, offset(base) 


Description: 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. The contents of general purpose register rt 
are stored at the memory location specified by the address only when the LL bit is 
set. 


If the other processor or device changes the physical address after the previous LL 
instruction has been executed, or if the ERET instruction exists between the LL 
and SC instructions, the register contents are not stored to the memory, and storing 
fails. 


The success or failure of the SC operation is indicated by the contents of general 
purpose register rt after execution of the instruction. A successful SC instruction 
sets the contents of general purpose register rt to 1; an unsuccessful SC instruction 
sets it to 0. 


The operation of SC is undefined when the address is different from the address 
used in the last LL instruction. 


This instruction is available in User mode; it is not necessary for CPO to be 
enabled. 


If either of the low-order two bits of the address is not zero, an address error 
exception takes place. 


If this instruction both fails and causes an exception, the exception takes 
precedence. 


This instruction is defined to maintain software compatibility with the Vp4400. 
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SC 


Store Conditional 
(continued) 


Operation: 


32 


64 


T:  vAddr <- ((offset,s)'® || offsetys_9) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
data <— GPRIrt]31_ 0 
if LLbit then 
StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 
endif 
GPR[rt] — 0°! || LLbit 


T:  vAddr < ((offset,s)4° || offsety5 0) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
data <— GPRi[rt]31..0 
if LLbit then 
StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 
endif 
GPR{rt] — 0° || LLbit 
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Exceptions: 


TLB miss exception 

TLB invalid exception 

TLB modification exception 
Bus error exception 
Address error exception 
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SCD 


CPU Instruction Set Details 


Store Conditional Doubleword SCD 


31 26 25 21 20 16 15 0 
SCD base rt offset 
111100 
6 5 5 16 
Format: 
SCD rt, offset(base) 
Description: 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. The contents of general purpose register rt 
are stored at the memory location specified by the address only when the LL bit is 
set. 


If another processor or device changes the target address after the previous LLD 
instruction has been executed, or if the ERET instruction exists between the LLD 
and SCD instructions, the register contents are not stored to the memory, and 
storing fails. 


The success or failure of the SCD operation is indicated by the contents of general 
purpose register rt after execution of the instruction. A successful SCD 
instruction sets the contents of general purpose register rf to 1; an unsuccessful 
SCD instruction sets it to 0. 


The operation of SCD is undefined when the address is different from the address 
used in the last LLD. 


This instruction is available in User mode; it is not necessary for CPO to be 
enabled. 


If either of the low-order three bits of the address is not zero, an address error 
exception takes place. 


If this instruction both fails and causes an exception, the exception takes 
precedence. 


This instruction is defined in the 64-bit mode and 32-bit Kernel mode. If this 
instruction is executed in the 32-bit User or Supervisor mode, the reserved 
instruction exception occurs. 


This instruction is defined to maintain software compatibility with the Vp4400. 
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Store Conditional Doubleword 
SCD (continued) SC D 


Operation: 


64 T:  vAddr < ((offset;s)** || offset;s 0) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
data < GPRirt] 
if LLbit then 
StoreMemory (uncached, DOUBLEWORD, data, pAddr, vAddr, DATA) 
endif 
GPR[rt] — 0°° || LLbit 


Remark In the 32-bit Kernel mode, the high-order 32 bits are ignored during 
virtual address creation. 


Exceptions: 


TLB miss exception 

TLB invalid exception 

TLB modification exception 

Bus error exception 

Address error exception 

Reserved instruction exception (32-bit User or Supervisor mode) 
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SD Store Doubleword SD 


31 26 25 21 20 16 15 0 


SD base rt offset 
111111 


6 5 5 16 


Format: 
SD rt, offset(base) 


Description: 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. The contents of general purpose register rt 
are stored at the memory location specified by the address. 


If either of the low-order three bits of the address are not zero, an address error 
exception occurs. 


This operation is defined for the Vp4300 operating in 64-bit mode and in 32-bit 
Kernel mode. Execution of this instruction in 32-bit User or Supervisor mode 
causes a reserved instruction exception. 


Operation: 


32 TT: vAddr <— ((offsetys)'® || offsetys 0) + GPR[base] 

(pAddr, uncached) < AddressTranslation (vAddr, DATA) 

data <— GPR[rt] 

StoreMemory (uncached, DOUBLEWORD, data, pAddr, vAddr, DATA) 
64 TT: vAddr< ((offset;s)4° || offsetys 0) + GPR[base] 

(pAddr, uncached) < AddressTranslation (vAddr, DATA) 


data — GPRirt] 
StoreMemory (uncached, DOUBLEWORD, data, pAddr, vAddr, DATA) 


Remark In the 32-bit Kernel mode, the high-order 32 bits are ignored during 
virtual address creation. 
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SD 


492 


Store Doubleword 
(continued) 


Exceptions: 


TLB miss exception 

TLB invalid exception 

TLB modification exception 

Bus error exception 

Address error exception 

Reserved instruction exception (32-bit User or Supervisor mode) 
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CPU Instruction Set Details 


SDCz Store Doubleword SDCz 


From Coprocessor z 


31 26 25 21 20 16 15 0 
SDCz base rt offset 
1111xx* 
6 5 5 16 
Format: 


SDCz rt, offset(base) 


Description: 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. Register rt of coprocessor unit z sources a 
doubleword, which the processor writes to the addressed memory location. The 
stored data is defined by individual coprocessor specifications. 


If any of the low-order three bits of the address is not zero, an address error 
exception takes place. 


This instruction is not valid for use with CPO. 


When the CP1 is specified, the FR bit of the Status register equals 0, and the least- 
significant bit in the 7t field is not 0, the operation of this instruction is undefined. 
If the FR bit equals 1, both odd and even registers can be specified by rt. 


* Refer to the table, Opcode Bit Encoding on next page, or 
16.7 CPU Instruction Opcode Bit Encoding. 
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Store Doubleword 
SsDCz From Coprocessor z SsDCz 


(continued) 


Operation: 


32 TT: vAddr < ((offset,s)'® || offsetys 9) + GPR[base] 
(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
data <= GPRirt), 
StoreMemory (uncached, DOUBLEWORD, data, pAddr, vAddr, DATA) 


64 TT: vAddr < ((offset,s)*® || offset,; 0) + GPR[base] 
(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
data <= GPRirt), 
StoreMemory (uncached, DOUBLEWORD, data, pAddr, vAddr, DATA) 


Exceptions: 
TLB miss exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
Coprocessor unusable exception 


Opcode Bit Encoding: 


SDCz Bit #31 30 29 28 27 26 0 
spc1;1/;1/1/1/0/1 


Bit #31 30 29 28 27 26 0 
spc2;1/1/1)/1/)1)/0 


Opcode '__ Coprocessor Number 
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SD L Store Doubleword Left SD L 
31 26 25 21 20 16 15 0 
SDL base rt offset 
101100 
6 5 5 16 
Format: 


SDL rt, offset(base) 


Description: 


This instruction is used in combination with the SDR instruction to store the 
doubleword data in the register to the doubleword in the memory that is not at the 
doubleword boundary. The SDL instruction stores the high-order portion of the 
data to the memory, while the SDR instruction stores the low-order portion. 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to generate a virtual address. Of the doubleword data in the memory 
whose most-significant byte is specified by the generated address, only the high- 
order portion of general purpose register rt is stored to the memory at the same 
doubleword boundary as the target address. Depending on the address specified, 
the number of bytes to be stored changes from | to 8. 


In other words, first the most-significant byte position of general purpose register 
rt is stored to the bytes in the addressed memory. If there is data of the low-order 
byte that follows the same doubleword boundary, the operation to store this data 
to the next byte of the memory is repeated. 


memory 
(big-endian) register 


address 8 | 8 


9 | 10] 11}12}13]14 15 | before 


address 0 | 0 


7/2131/4151 617 (storing A] B|C)D| E|F |G H| $24 


SDL $24,1($0) 


address 8 | 8/ 9 | 10| 11) 12/13] 14] 15 | after 


address 0 | 0 


A|B|C|D/E| F| G [Storing 


User’s Manual U10504EJ7VOUMOO 495 


Chapter 16 


Store Doubleword Left 
SDL (continued) SDL 


The address error exception does not occur even if the specified address is not 
located at the doubleword boundary. This operation is defined in the 64-bit mode 
and 32-bit Kernel mode. If this instruction is executed in the 32-bit User or 
Supervisor mode, the reserved instruction exception occurs. 


Operation: 


64 TT:  vAddr < ((offset;s)*® || offset 45 ¢) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
pAddr < pAddrpgize -1_3|| (PAddrs 9 xor ReverseEndian’) 
If BigEndianMem = 0 then 
pAddr < pAddrs;_3 || 0° 
endif 
byte — vAddra 9 xor BigEndianCPU? 
data — 0°F ®'Py'* || GPRirtlgs._ 56-s*byte 
Storememory (uncached, byte, data, pAddr, vAddr, DATA) 


Remark In the 32-bit Kernel mode, the high-order 32 bits are ignored during 
virtual address creation. 
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Store Doubleword Left 


(continued) 


SDL 


The relationships between the addresses given to the SDL instruction and the 


result (bytes for doubleword in the memory) are shown below: 


SDL 
Register A B Cc D E F G H 
Memory | J K L M N O P 
BigEndianCPU = 0 BigEndianCPU = 1 
offset offset 
vAddro_o destination type | LEM|BEM destination type | LEM |BEM 
0 1 J KLMNOA| O 0|;7 |ABCDEFGH|/7 0/0 
1 1 J KLMNAB) 1 0;6 || ABCDEF G| 6 0 1 
2 1 JKLMABC}] 2 0;5 || J ABCDE FF] 5 0/2 
3 1 J KLABCD] 3 0;4 || J KABCDE] 4 0/3 
4 | JKABCDE| 4 0;3 || J KLABC DI 3 0/4 
5 | JABCDEF] 5 0;2 || J KL MAB C} 2 0) 5 
6 | ABCDEFG| 6 0 | 1 1 J KLMNAB) 1 0 | 6 
7 ABCDEFGH| 7 0;0 || J KLMNOA|O 0/7 
Remark Type: access type output to memory (Refer to Figure 3-2 Byte 
Access within a Doubleword.) 
Offset: pAddr> ¢ output to memory 
LEM Little-endian memory (BigEndianMem = 0) 
BEM Big-endian memory (BigEndianMem = 1) 
Exceptions: 


TLB miss exception 

TLB invalid exception 
TLB modification exception 
Bus error exception 


Address error exception 


Reserved instruction exception (32-bit User or Supervisor mode) 
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SDR 


Store Doubleword Right S D Fe 


31 26 25 21 20 16 15 0 
SDR base rt offset 
101101 
6 5 5 16 
Format: 


SDR rt, offset(base) 


Description: 


This instruction is used in combination with the SDL instruction to store the 
doubleword data in the register to the word data in the memory that is not at the 
doubleword boundary. The SDL instruction stores the high-order portion of the 
data to the memory, while the SDR instruction stores the low-order portion. 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to generate a virtual address. Of the doubleword data in the memory 
whose least-significant byte is specified by the generated address, only the low- 
order portion of general purpose register rt is stored to the memory at the same 
doubleword boundary as the target address. Depending on the address specified, 
the number of bytes to be stored changes from | to 8. 


In other words, first the least-significant byte position of general purpose register 
rt is stored to the bytes in the addressed memory. If there is data of the high-order 
byte that follows the same doubleword boundary, the operation to store this data 
to the next byte of the memory is repeated. 


memory 


(big-endian) register 


9 10/ 11/12|13| 14|15| before 
A|B|C|D|E|F)| G/H| $24 
1/2/3|4)|5|6| 7) Slorng $ 


SDR $24,10($0) 


9/10, 11/12/13} 14/15) after 


address 8 | g 
address 0 | g 
address 8 | 8 
address 0 | E 


storing 
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Store Doubleword Right 
(continued) SD R 


The address error exception does not occur even if the specified address is not 
located at the doubleword boundary. This operation is defined in the 64-bit mode 
and 32-bit Kernel mode. If this instruction is executed in the 32-bit User or 
Supervisor mode, the reserved instruction exception occurs. 


Operation: 


64 


T: 


vAddr <- ((offset;s)** || offset 45 0) + GPR[base] 


pAddr =< pAddrpsize — 1...3 || (PAddro 9 xor ReverseEndian®) 
If BigEndianMem = 0 then 


endif 

byte <— vAddry_9 xor BigEndianCPU? 

data — GPR{[rt]63-8+byte || 0° Y"° 

StoreMemory (uncached, DOUBLEWORD-byte, data, pAddr, vAddr, 
DATA) 


(pAddr, uncached) < AddressTranslation (vAddr, DATA 


pAddr < pAddrpgize - 1... || 0° 


Remark In the 32-bit Kernel mode, the high-order 32 bits are ignored during 
virtual address creation. 
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Store Doubleword Right 
S D R (continued) S D R 


The relationships between the addresses given to the SDR instruction and the 
result (bytes for doubleword in the memory) are shown below: 


SDR 
Register A B Cc D E F G H 


Memory | J K L M N O P 
BigEndianCPU = 0 BigEndianCPU = 1 
vAddr2 9 destination type offset destination type offset 
LEM |BEM LEM |BEM 
0 ABCDEFGH| 7 0|;0 |HJ KLMNOP)| O 7 | 0 
1 BCDEF GHP} 6 1/0 |GHKLMNOP| 1 6 | 0 
2 CDEFGHOP| 5 2/0 |FGHLMNOP| 2 5 | 0 
3 DEF GHNOP| 4 3/0 |EF GHMNOP) 3 4 | 0 
4 EFGHMNOP| 3 4/0 |DEFGHNOP| 4 3 | 0 
5 F GHLMNOP| 2 5/0 |CDEFGHOP) 5 2 | 0 
6 GHKLMNOP| 1 6|/0 |BCDEFGH P| 6 1 0 
7 HJKLMNOP} 0O 7/0 |ABCDEFGH| 7 0 | 0 
Remark Type: access type output to memory (Refer to Figure 3-2 Byte 
Access within a Doubleword.) 
Offset: pAddr 9 output to memory 
LEM Little-endian memory (BigEndianMem = 0) 
BEM Big-endian memory (BigEndianMem = 1) 
Exceptions: 


TLB miss exception 

TLB invalid exception 

TLB modification exception 

Bus error exception 

Address error exception 

Reserved instruction exception (32-bit User or Supervisor mode) 
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SH Store Halfword SH 
31 26 25 21 20 16 15 0 
SH base rt offset 
101001 
6 5 5 16 
Format: 


SH rt, offset(base) 


Description: 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. The least-significant halfword of register 
rt is stored in the memory specified by the address. 


If the least-significant bit of the address is not zero, an address error exception 


occurs. 


Operation: 


32 Ti 


64 T: 


vAddr < ((offset;s)'® || offsetys 9) + GPR[base] 

(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
pAddr < pAddrpgize.1..3 || (pAddr 9 xor (ReverseEndian? || 0)) 
byte <— vAddry 9 xor (BigEndianCPU? || 0) 
data — GPR[rt]g3-s*byte...0 || 0° ¥"° 
StoreMemory (uncached, HALFWORD, data, pAddr, vAddr, DATA) 
vAddr < ((offset;s)*® || offset;s 9) + GPR[base] 

(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
pAddr < pAddrpgijz_e-1...3 || (PAddro. 9 xor (ReverseEndian? || 0)) 
byte — vAddrs 9 xor (BigEndianCPU? || 0) 
data — GPR{[rt]g3-s+byte...0 || 0° ¥"° 
StoreMemory (uncached, HALFWORD, data, pAddr, vAddr, DATA) 
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SH 


502 


Store Halfword 
(Continued) 


Exceptions: 


TLB miss exception 

TLB invalid exception 

TLB modification exception 
Bus error exception 
Address error exception 
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SLL Shift Left Logical SLL 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL 0 rt rd sa SLL 
000000 00000 000000 
6 5 5 5 5 6 
Format: 
SLL rd, rt, sa 
Description: 


The contents of general purpose register rt are shifted left by sa bits, inserting 
zeros into the low-order bits. The result is stored in general purpose register rd. 
In the 64-bit mode, the value resulting from sign-extending the shifted 32-bit 
value is stored as a result. If the shift value is 0, the low-order 32 bits of the 64- 
bit value is sign-extended. This instruction can generate a 64-bit value that sign- 
extends a 32-bit value. 


Operation: 


32. T: GPR[rd] — GPRIrt]31_ sq. || 084 


64 T: s<O|l|sa 
temp < GPRIrt]31-s...0 || 0° 
GPR{rd] <— (temp31)°" || temp 


Exceptions: 


None 


Caution If the shift value of this instruction is 0, the assembler may treats 
this instruction as NOP. When using this instruction for sign 
extension, check the specifications of the assembler. 
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SLLV Shift Left Logical Variable SLLV 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 SLLV 
000000 00000 000100 
6 5 5 5 5 6 
Format: 


SLLV rd, rt, rs 


Description: 


The contents of general purpose register rt are shifted left the number of bits 
specified by the low-order five bits of the contents of the general purpose register 
rs, inserting zeros into the low-order bits. The result is stored in general purpose 
register rd. In the 64-bit mode, the value resulting from sign-extending the shifted 
32-bit value is stored as a result. If the shift value is 0, the low-order 32 bits of the 
64-bit value is sign-extended. This instruction can generate a 64-bit value that 
sign-extends a 32-bit value. 


Operation: 


32 T: s<GPRIrs]4 9 
GPR[rd]<- GPR[rt].31-s)...0 || 0° 
64 T: s<O||GPRIrs]4 9 
temp < GPR[rt](31-s)...0 || 0° 
GPR[rd] < (temp31)?* || temp 


Exceptions: 


None 


Caution If the shift value of this instruction is 0, the assembler may treats 
this instruction as NOP. When using this instruction for sign 
extension, check the specifications of the assembler. 
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CPU Instruction Set Details 


Set On Less Than SLT 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 SLT 
000000 00000 101010 
6 5 5 5 5 6 
Format: 
SLT rd, rs, rt 
Description: 
The contents of general purpose register rt are subtracted from the contents of 
general purpose register rs. Assuming these register contents as signed integers, 
if the contents of general purpose register rs are less than the contents of general 
purpose register rt, one is stored in the general purpose register rd; otherwise zero 
is stored in the general purpose register rd. 
An integer overflow exception never occurs. The comparison is valid even if the 
subtraction used during the comparison overflows. 
Operation: 
32 T: — if GPR[rs] < GPR[rt] then 
GPR[rd] <— 0°" || 1 
else 
GPR[rd] <— 0°2 
endif 
64 TT: — if GPR[rs] < GPR[rt] then 
GPR[rd] — 0°? || 1 
else 
GPR[rd] <- 064 
endif 
Exceptions: 
None 
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SLTI 


Set On Less Than Immediate 


SLTI 


31 26 25 21 20 16 15 0 
SLTI rs rt immediate 
001010 
6 5 5 16 
Format: 


Descrip 


SLTI rt, rs, immediate 


tion: 


The 16-bit immediate is sign-extended and subtracted from the contents of general 
purpose register rs. Assuming these values are signed integers, if rs contents are 


less than the sign-extended immediate, one is stored in the general purpose register 


rt; otherwise zero is stored in the general purpose register rt. 


An integer overflow exception never occurs. The comparison is valid even if the 


subtraction overflows. 


Operation: 
32 TT: if GPR[rs] < (immediate,5)'® || immediate;s 9 then 
GPR[rt] — 031 || 4 
else 
GPR[rt] — 0°? 
endif 
64 TT: if GPR[rs] < (immediate,s)*° || immediate,; 9 then 
GPR[rt] — 08 || 4 
else 
GPR[rt] — 0% 
endif 
Exceptions: 
None 
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Set On Less Than 
SLTI U Immediate Unsigned SLTI U 


31 26 25 21 20 16 15 0 
SLTIU rs rt immediate 
001011 
6 5 5 16 
Format: 


SLTIU rt, rs, immediate 


Description: 


The 16-bit immediate is sign-extended and subtracted from the contents of general 
purpose register rs. Assuming these values are unsigned integers, if rs contents 
are less than the sign-extended immediate, one is stored in the general purpose 
register rt; otherwise zero is stored in the general purpose register rt. 


An integer overflow exception never occurs. The comparison is valid even if the 
subtraction overflows. 


Operation: 


32 TT: if(0|| GPR[rs]) < (immediate,s5) '® || immediate;s 9 then 
GPR[rt] — 031 || 4 
else 
GPR[rt] — 092 
endif 


64 TT: if (0 || GPR{[rs]) < (immediate,5)*° || immediate;s 9 then 
GPR[rt] — 08 || 4 
else 
GPR[rt] — 0&4 
endif 


Exceptions: 


None 


User’s Manual U10504EJ7VOUMOO 507 


Chapter 16 


SLTU 


Set On Less Than Unsigned 


SLTU 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 SLTU 
000000 00000 101011 
6 5 5 5 5 6 

Format: 
SLTU rd, rs, rt 
Description: 


The contents of general purpose register rt are subtracted from the contents of 


general purpose register rs. Assuming these values are unsigned integers, if the 


contents of general purpose register rs are less than the contents of general 
purpose register rt, one is stored in the general purpose register rd; otherwise zero 


is stored in the general purpose register rd. 


An integer overflow exception never occurs. The comparison is valid even if the 


subtraction overflows. 


Operation: 
32 TT: if (0 || GPR[rs]) <0 || GPR[rt] then 
GPR[rd] — 0°" || 1 
else 
GPR[rd] < 02 
endif 
64 TT: if (0|| GPR[rs]) <0 || GPR[rt] then 
GPR[rd] — 0° || 1 
else 
GPRI[rd] — 0&4 
endif 
Exceptions: 
None 
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SRA Shift Right Arithmetic SRA 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL 0 rt rd sa SRA 
000000 00000 000011 
6 5 5 5 5 6 

Format: 
SRA rd, rt, sa 
Description: 


The contents of general purpose register rt are shifted right by sa bits, inserting 
signed bits into the high-order bits. The result is stored in the general purpose 
register rd. In 64-bit mode, the sign-extended 32-bit value is stored as the result. 


Operation: 


32. T: GPR[rd] — (GPR[rt]3;)S* || GPR[rt] 31. sq 


64 T: s<O|l|sa 
temp < (GPR[rt]31)° || GPR[rt] 31s 
GPRird] <— (temp31)°" || temp 


Exceptions: 


None 
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SRAV Arithmetic Variable SRAV 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 SRAV 
000000 00000 000111 
6 5 5 5 5 6 

Format: 
SRAV rd, rt, rs 
Description: 


The contents of general purpose register rt are shifted right by the number of bits 
specified by the low-order five bits of general purpose register rs, sign-extending 
the high-order bits. The result is stored in the general purpose register rd. In 64- 
bit mode, the sign-extended 32-bit value is stored as the result. 


Operation: 


32 T: s<GPRIrs]q_ 9 
GPR[rd] <— (GPR[r]31)° || GPR[rt}s1__s 


64 T: s<GPRIrs]q. 0 
temp < (GPRIrt]31)° || GPRI[rtl31__s 
GPR[rd] <— (temp31)°" || temp 


Exceptions: 


None 
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SRL Shift Right Logical SRL 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL 0 rt rd sa SRL 
000000 00000 000010 
6 5 5 5 5 6 


Format: 
SRL rd, rt, sa 


Description: 


The contents of general purpose register rt are shifted right by sa bits, inserting 
zeros into the high-order bits. The result is stored in the general purpose register 
rd. In 64-bit mode, the sign-extended 32-bit value is stored as the result. 


Operation: 


32. = T: GPRi[rd] — 0 §*|| GPRIrt]}31_ sa 


64 T: s<O|l|sa 
temp < 0° || GPR[rt]31._ > 
GPR{rd] — (temp31)°" || temp 


Exceptions: 


None 
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SRLV © Shift Right Logical Variable SGRLV 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 SRLV 
000000 00000 000110 
6 5 5 5 5 6 

Format: 
SRLV rd, rt, rs 
Description: 


The contents of general purpose register rt are shifted right by the number of bits 
specified by the low-order five bits of general purpose register rs, inserting zeros 
into the high-order bits. The result is stored in the general purpose register rd. In 
64-bit mode, the sign-extended 32-bit value is stored as the result. 


Operation: 


32 T: s<GPRIrslq_ 9 
GPR[rd] — 08 || GPR[rt]3;__; 


64 T: s<GPRIrs]q4 6 
temp < 0° || GPRIrt]31._5 
GPR{[rd] — (temp31)°" || temp 


Exceptions: 


None 
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S U R Subtract S U B 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 SUB 
000000 00000 100010 

6 5 5 5 5 6 


Format: 
SUB rd, rs, rt 


Description: 


The contents of general purpose register 7t are subtracted from the contents of 
general purpose register rs, and result is stored into general purpose register rd. In 
64-bit mode, the sign-extended 32-bit values is stored as the result. 


An integer overflow exception occurs if the carries out of bits 30 and 31 differ (2’s 
complement overflow). The destination register rd is not modified when an 
integer overflow exception occurs. 


Operation: 


32 TT: GPR[rd] — GPRirs] — GPR{[rt] 


64 T: temp < GPR[rs] — GPR{[rt] 
GPRIrd] <— (temp31)°* || temp31._0 


Exceptions: 


Integer overflow exception 
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SU BU Subtract Unsigned SU BU 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 SUBU 
000000 00000 100011 

6 5 5 5 5 6 


Format: 
SUBU rd, rs, rt 


Description: 


The contents of general purpose register rt are subtracted from the contents of 
general purpose register rs and the result is stored in general purpose register rd. 
In 64-bit mode, the sign-extended 32-bit values is stored as the result. 


The only difference between this instruction and the SUB instruction is that 
SUBU never causes an integer overflow exception. 


Operation: 


32 TT: GPR[rd] — GPRirs] — GPR{[rt] 


64 TT: temp < GPR{[rs] — GPR[rt] 
GPRIrd] < (temp31)°* || tempsi._0 


Exceptions: 


None 


514 User’s Manual U10504EJ7VOUM00 


CPU Instruction Set Details 


SW Store Word SW 


31 26 25 21 20 16 15 0 
SW base rt offset 
101011 
6 5 5 16 
Format: 


SW rt, offset(base) 


Description: 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. The contents of general purpose register rt 
are stored in the memory location specified by the address. If either of the low- 
order two bits of the address are not zero, an address error exception occurs. 


Operation: 

32 T: vAddr< ((offset;s)'© || offsetys 0) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
data < GPRirt]3;_ 6 
StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 


64 TT: vAddr < ((offset,s)*® || offset,s 0) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
data <— GPRI[rt]31.0 
StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 


Exceptions: 


TLB miss exception 

TLB invalid exception 

TLB modification exception 
Bus error exception 
Address error exception 
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SWCz Store Word From Coprocessor z SWCz 


31 26 25 21 20 16 15 0 
SWCz base rt offset 
1110xx* 
6 5 5 16 
Format: 


Descrip 


SWCz rt, offset(base) 


tion: 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. Coprocessor register rt of the CPz is stored 
in the addressed memory. The data to be stored is defined by individual 
coprocessor specifications. This instruction is not valid for use with CPO. 


If either of the low-order two bits of the address is not zero, an address error 
exception occurs. 


Operation: 


32 


64 T: 


T: vAddr < ((offsets)'®!! offsetys 9) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr < pAddrpgize.1..3 || (PAddro 9 xor (ReverseEndian || 0°)) 
byte — vAddro_ xor (BigEndianCPU || 07) 
data < COPzSW (byte, rt) 

StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 


vAddr <— ((offsetss)4° || offsety5 0) + GPR[base] 

(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
pAddr < pAddrpgize.1..3 || (PAddro 9 xor (ReverseEndian || 0°) 
byte — vAddro_ xor (BigEndianCPU || 02) 
data — COPzSW (byte,rt) 

StoreMemory (uncached, WORD, data, pAddr, vAddr DATA) 
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16.7 CPU Instruction Opcode Bit Encoding. 
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SWCz Store Word From Coprocessor z SWCz 


(Continued) 


Exceptions: 


TLB miss exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
Coprocessor unusable exception 


Opcode Bit Encoding: 
SWCz Bit #31 30 29 28 27 26 0 
sweoi}1/1]/1)0]0)]1 
Bit #31 30 29 28 27 26 0 
Swo2/1/1/1/0]11/0 
Opcode '__ Coprocessor Number 
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SWL 


Store Word Left SWL 


31 26 25 21 20 16 15 0 
SWL base rt offset 
101010 
6 5 5 16 


Format: 


SWL rt, offset(base) 


Description: 


This instruction is used in combination with the SWR instruction to store the word 
in the register to the word in the memory that is not at the word boundary. The 
SWL instruction stores the high-order portion of the data to the memory, while the 
SWR instruction stores the low-order portion. 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to generate a virtual address. Of the word data in the memory whose 
most-significant byte is specified by the generated address, only the high-order 
portion of general purpose register 7t is stored to the memory at the same word 
boundary as the target address. 


Depending on the address specified, the number of bytes to be stored changes 
from | to 4. 


In other words, first the most-significant byte position of general purpose register 
rt is stored to the bytes in the addressed memory. If there is data of the low-order 
byte that follows the same word boundary, the operation to store this data to the 

next byte of the memory is repeated. 


No address exceptions occur due to the specified address which is not located at 
the word boundary. 


memory 
(big-endian) register 
address 4 4 5 6 7 | before 1 
4 A B Cc D 
address 0 0 1 2 3 | storing | | $24 
address4 | 4 5 6 7” | Shier SWL $24,1($0) 
address 0 0 A B C | storing 
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Store Word Left 
SWL (Continued) SWL 


Operation: 


32 T: vAddr < ((offset;s)'® || offset 45 0) + GPR[base] 

(pAddr, uncached) <— AddressTranslation (vAddr, DATA) 
pAddr <— pAddrpgize -1..3 || (PAddro. 9 xor ReverseEndian’) 
If BigEndianMem = 0 then 

pAddr < pAddrgy__ > || 0 
endif 
byte — vAddr; 9 xor BigEndianCPU? 
if (vAddrs xor BigEndianCPU) = 0 then 

data < 0° || 0° "PY¥° || GPRIrt]31__24-8*byte 
else 

data — 0°* 8"Py'e || GPRirt]s1._24-8*byte || 09° 
endif 
Storememory (uncached, byte, data, pAddr, vAddr, DATA) 


64 T: vAddr < ((offset;s)*® || offset 45 0) + GPR[base] 
(pAddr, uncached) <— AddressTranslation (vAddr, DATA) 
pAddr < pAddrg;_3 || (pAddro 9 xor ReverseEndian®) 
If BigEndianMem = 0 then 
pAddr < pAddr; __ > || 0? 
endif 
byte — vAddr, 9 xor BigEndianCPU? 
if (vAddrs xor BigEndianCPU) = 0 then 
data < 0° || 0° "PY'° || GPRIrt]s1__24-8*byte 
else 
data — 0°* 8"Py'e || GPRirt}s1._24-8*byte || 09° 
endif 
StoreMemory (uncached, byte, data, pAddr, vAddr, DATA) 
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Store Word Left 
SWL (Continued) SWL 


The relationships between the contents given to the SWL instruction and the result 
(bytes for words in the memory) are shown below: 


SWL 
Register A B Cc D | E F G H 
Memory | J K L | M N O P 
BigEndianCPU = 0 BigEndianCPU = 1 
offset offset 
vAddro._.o destination type [| em BEM destination type |) Em |BEM 
0 | J KLMNOE| 0O 0|7 |EF GHMNOP) 3 4 | 0 
1 | J KLMNEF] 1 0|6 /1 EF GMNOP)| 2 4 1 
2 | J KLMEFQG| 2 0/5 || J EFMNOP| 1 4 | 2 
3 | J KLEFGH| 3 0/4 || J KEMNOP| O 4 | 3 
4 1 J KEMNOP| 0O 4;3 || J KLEFGH) 3 0 | 4 
5 | J EFMNOP| 1 4;2 || J KL MEF @ 2 0 | 5 
6 | EF GMNOP| 2 4) 1 | J KLMNE F) 1 0 | 6 
7 EF GHMNOP| 3 4;0 || J KLMNOE; O 0 | 7 
Remark Type: access type output to memory (Refer to Figure 3-2 Byte 
Access within a Doubleword.) 
Offset: pAddr, , output to memory 
LEM §Little-endian memory (BigEndianMem = 0) 
BEM Big-endian memory (BigEndianMem = 1) 
Exceptions: 
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TLB miss exception 

TLB invalid exception 

TLB modification exception 
Bus error exception 
Address error exception 
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Store Word Right SWR 
0 


31 26 25 21 20 16 15 


base rt offset 


5 5 16 


Format: 


SWR rt, offset(base) 


Description: 


This instruction is used in combination with the SWL instruction to store word 
data in the register to the word data in the memory that is not at the word boundary. 
The SWL instruction stores the high-order portion of the data to the memory, 
while the SWR instruction stores the low-order portion. 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to generate a virtual address. Of the word data in the memory whose 
least-significant byte is specified by the generated address, only the low-order 
portion of general purpose register rt is stored to the memory at the same word 
boundary as the target address. Depending on the address specified, the number 
of bytes to be stored changes from | to 4. 


In other words, first the least-significant byte position of general purpose register 
rt is stored to the bytes in the addressed memory. If there is data of the high-order 
byte that follows the same word boundary, the operation to store this data to the 
next byte of the memory is repeated. 


No address exceptions occur due to the specified address which is not located at 
the word boundary. 


memory 
(big-endian) 


address 4 


register 


address 0 


4/ 5/| 6 | 7 | before 
0 | 2 3 storing a B C D | $24 


SWR $24,4($0) 


address 4 


D 5 6 7 after 


address 0 


0 1 2 3 storing 
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Store Word Right 
SWR (Continued) SW R 


Operation: 


32 T: vAddr < ((offsets)'® || offset 45 0) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
pAddr <— pAddrpgize — 1...3 || (DAddro 9 xor ReverseEndian’) 
BigEndianMem = 0 then 
pAddr < pAddr31_» || 0° 
endif 
byte — vAddr, 9 xor BigEndianCPU? 
if (vAddro xor BigEndianCPU) = 0 then 
data — 0° || GPRIrtls1-g*byte...0 Il 0° °° 
else 
data <- GPRIrtls1-g*byte...0 || O° ""° || 0°* 
endif 
Storememory (uncached, WORD-byte, data, pAddr, vAddr, DATA) 


64 T: vAddr < ((offset;s)*° || offset 45 0) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
pAddr < pAddrpgize — 1...3 || (DAddro 9 xor ReverseEndian’) 
If BigEndianMem = 0 then 
pAddr < pAddrs;_» || 0° 
endif 
byte — vAddr, 9 xor BigEndianCPU? 
if (vAddr> xor BigEndianCPU) = 0 then 
data <— 0°? || GPRIrt}31-s*byte...0 || 08 Y" 
else 
data — GPRIrt}31-g*byte...0 |] 0° PY" |] 0° 
endif 
StoreMemory (uncached, WORD-byte, data, pAddr, vAddr, DATA) 
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Store Word Right 
(Continued) SW R 


The relationships between the register contents given to the SWR instruction and 
the result (bytes for words in the memory) are shown below: 


SWR 
Register A B Cc D E F G H 
Memory | J K L M N O P 
BigEndianCPU = 0 BigEndianCPU = 1 
offset offset 
vAddro._o destination type | ew | BEM destination type] | eEm/BEM 
0 | J KLEFGH, 3 0/4 |HJ KLMNOP| 0 7 | 0 
1 | J KLFGHP) 2 1/4 |GHKLMNOP |} 1 6 | 0 
2 | J KLGHOP| 1 2/4 |F GHLMNOP| 2 5 | 0 
3 1 JKLHNOP, O 3/4 |EF GHMNOP| 3 4 | 0 
4 EFGHMNOP, 3 4/0 || J KLHNOP| 0 3 | 4 
5 F GHLMNOP, 2 5|/0 || J KLGHOP| 1 2| 4 
6 GHKLMNOP| 1 6|;0 ;I| J KLFGHP} 2 1 4 
7 HJ KLMNOP |! O 7/0 || J KLEFGH) 3 0 | 4 
Remark Type: access type output to memory (Refer to Figure 3-2 Byte 
Access within a Doubleword.) 
Offset: pAddry 9 output to memory 
LEM §Little-endian memory (BigEndianMem = 0) 
BEM Big-endian memory (BigEndianMem = 1) 
Exceptions: 


TLB miss exception 

TLB invalid exception 

TLB modification exception 
Bus error exception 
Address error exception 
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SYNC Synchronize SYNC 


31 26 25 as 0 
SPECIAL 0 SYNC 
000000 0000 0000 0000 0000 0000 001111 
6 20 6 

Format: 
SYNC 
Description: 


The SYNC instruction is executed as a NOP on the Vp4300. This operation 
maintains compatibility with code that conforms to the Vp4400. 


This instruction is defined to maintain software compatibility with the Vp4400. 


Operation: 


32,64 T: SyncOperation () 


Exceptions: 


None 
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SYSCALL — Systemcal = SYSCALL 


31 26 25 6 5 0 
SPECIAL Code SYSCALL 
000000 001100 
6 20 6 
Format: 
SYSCALL 
Description: 


A system call exception occurs after this instruction is executed, unconditionally 
transferring control to the exception handler. 


A parameter can be sent to the exception handler by using the code area. If the 
exception handler uses this parameter, the contents of the memory word including 
the instruction must be loaded as data. 


Operation: 


32,64 T: SystemCallException 


Exceptions: 


System Call exception 
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TEQ Trap If Equal TEQ 


31 26 25 21 20 16 15 6 5 0 
SPECIAL rs rt code TEQ 
000000 110100 

6 5 5 10 6 


Format: 
TEQ rs, rt 


Description: 


The contents of general purpose register rt are compared with general purpose 
register rs. If the contents of general purpose register rs are equal to the contents 
of general purpose register rt, a trap exception occurs. 


A parameter can be sent to the exception handler by using the code area. If the 
exception handler uses this parameter, the contents of the memory word including 
the instruction must be loaded as data. 


Operation: 
32, 64 T: if GPR[rs] = GPR[rt] then 
TrapException 
endif 
Exceptions: 


Trap exception 
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Trap If Equal Immediate 


TEQI 


Trap exception 
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31 26 25 21 20 16 15 0 
REGIMM rs TEQI immediate 
000001 01100 
6 5 5 16 
Format: 
TEQI rs, immediate 
Description: 
The 16-bit immediate is sign-extended and compared with the contents of general 
purpose register rs. If the contents of general purpose register rs are equal to the 
sign-extended immediate, a trap exception occurs. 
Operation: 
32 TT: if GPRirs] = (immediate, 5) ' || immediate;s5 then 
TrapException 
endif 
64 T: if GPR{[rs] = (immediate ;5)48 || immediate;s5 then 
TrapException 
endif 
Exceptions: 
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TG E Trap If Greater Than Or Equal TG E 


31 26 25 21 20 16 15 6 5 0 
SPECIAL rs rt code TGE 
000000 110000 
6 5 5 10 6 

Format: 
TGE rs, rt 
Description: 


The contents of general purpose register rt are compared with the contents of 
general purpose register rs. Assuming both register contents are signed integers, 
if the contents of general purpose register rs are greater than or equal to the 
contents of general purpose register rt, a trap exception occurs. 


A parameter can be sent to the exception handler by using the code area. If the 
exception handler uses this parameter, the contents of the memory word including 
the instruction must be loaded as data. 


Operation: 


32,64 T: if GPR[rs] = GPR{[rt] then 
TrapException 
endif 


Exceptions: 


Trap exception 
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TG El Trap If Greater Than Or Equal Immediate TG E| 


31 26 25 21 20 16 15 0 
REGIMM rs TGEl immediate 
000001 01000 
6 5 5 16 
Format: 


TGEI rs, immediate 


Description: 


The 16-bit immediate is sign-extended and compared with the contents of general 
purpose register rs. Assuming both values are signed integers, if the contents of 
general purpose register rs are greater than or equal to the sign-extended 
immediate, a trap exception occurs. 


Operation: 
32 T: if GPR[rs] = (immediate;s)'® || immediate;5 9 then 
TrapException 
endif 
64 T: if GPR[rs] = (immediate,s)*° || immediate;5 9 then 
TrapException 
endif 
Exceptions: 


Trap exception 
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Trap If G Than Or Equal 
TG El U ba a eh inched TG El U 


31 26 25 21 20 16 15 0 
REGIMM rs TGEIU immediate 
000001 01001 
6 5 5 16 
Format: 


TGEIU rs, immediate 


Description: 


The 16-bit immediate is sign-extended and compared with the contents of general 
purpose register rs. Assuming both values are unsigned integers, if the contents 
of general purpose register rs are greater than or equal to the sign-extended 
immediate, a trap exception occurs. 


Operation: 
32 T: if (0 || GPR[rs]) = (0 || (immediate,;) '® || immediate;5 9) then 
TrapException 
endif 
64 T: if (0 || GPR[rs}) = (0 || (immediate,;)*8 || immediate;5 9) then 
TrapException 
endif 


Exceptions: 


Trap exception 
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TG EU trap If Greater Than Or Equal Unsigned TG EU 


31 26 25 21 20 16 15 6 5 0 
SPECIAL rs rt code TGEU 
000000 110001 
6 5 5 10 6 


Format: 
TGEU rs, rt 


Description: 


The contents of general purpose register rt are compared with the contents of 
general purpose register rs. Assuming both values are unsigned integers, if the 
contents of general purpose register rs are greater than or equal to the contents of 
general purpose register rt, a trap exception occurs. 


A parameter can be sent to the exception handler by using the code area. If the 
exception handler uses this parameter, the contents of the memory word including 
the instruction must be loaded as data. 


Operation: 


32,64 TT: — if(0|| GPR{[rs]) = (0 || GPR[rt]) then 
TrapException 
endif 


Exceptions: 


Trap exception 


User’s Manual U10504EJ7VOUMOO 531 


Chapter 16 


TLBP Probe TLBFor Matching Entry TLBP 


31 26 25 24 65 0 
COPO CO 0 TLBP 
010000 1 0000000 0000 0000 0000 001000 
6 1 19 6 
Format: 
TLBP 
Description: 


Searches a TLB entry that matches with the contents of the entry Hi register and 
sets the number of that TLB entry to the index register. If a TLB entry that 
matches is not found, sets the most significant bit of the index register. 


The architecture does not specify the operation of memory references associated 
with the instruction immediately after a TLBP instruction, nor is the operation 
specified if more than one TLB entry matches. 


Operation: 


32 TT: Index 1 || 02° || Undefined® 
for i in 0... TLBEntries—1 
if (TLB[i]95,..77 = EntryHig1._.13) and (TLB[il7g or 
(TLB[i]74...64 = EntryHi7. )) then 
Index = 07° lis, 9 
endif 
endfor 


64. TT: Index< 1 || 02° || Undefined® 

for i in 0... TLBEntries—1 
if (TLB[i}167,..141 and not (0'° || TLBfi}216...205)) 
= (EntryHigg 43 and not (0'° || TLBIil216,..205)) and 
(TLB[I]140 of (TLBUI} 135.128 = EntryHiz,_ 9) then 

Index — 0° ||is5 9 

endif 

endfor 


Exceptions: 


Coprocessor unusable exception 
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TLBR Read Indexed TLB Entry TLBR 
31 26 25 24 65 0 
COPO CO 0 TLBR 
010000 1 0000000 0000 0000 0000 000001 

6 1 19 6 
Format: 
TLBR 
Description: 


The EntryHi and EntryLo registers are loaded with the contents of the TLB entry 
pointed at by the contents of the Index register. The G bit (which controls ASID 
matching) read from the TLB is written into both of the EntryLo0 and EntryLol 


registers. 


The operation is invalid if the contents of the Index register are greater than the 


number of TLB entries in the processor. 


Operation: 


32 T: PageMask <— TLB[Indexs, 9]127...96 


EntryHi <— TLB[Indexs olgs5. 64 and not TLB[Indexs o]127. 96 


EntryLo1 <-TLB[Indexs, olg3...33l| TLB[Indexs 


ol76 


EntryLoO < TLB[Indexs, o]31...4|| TLB[Indexs, ol7¢ 


64 T: PageMask <- TLB[Indexs, gloss. 192 


EntryHi — TLB[Indexs o]i91..128 and not TLB[Indexs oloss. 192 


EntryLo1 <-TLB[Indexs, ol127..65 || TLB[Indexs o]1ao 


EntryLoO < TLB[Indexs, ole3..4 || TLB[lIndexs, . 


ol140 


Exceptions: 


Coprocessor unusable exception 
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TLBWI Write Indexed TLB Entry TLBWI 


31 26 25 24 65 0 
COPO CO 0 TLBWI 
010000 1 0000000 0000 0000 0000 000010 
6 1 19 6 
Format: 
TLBWI 
Description: 


The TLB entry pointed at by the Index register is loaded with the contents of the 
EntryHi and EntryLo registers. The G bit of the TLB is written with the logical 
AND of the G bits in the EntryLo0 and EntryLo] registers. 


The operation is invalid if the contents of the Index register are greater than the 
number of TLB entries in the processor. 


Operation: 


32,64 T: TLB[Indexs. o] <— 
PageMask || (EntryHi and not PageMask) || EntryLo1 || EntryLo0 


Exceptions: 


Coprocessor unusable exception 
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TLBWR WwriteRandomTLBEnty TLBWR 


31 26 25 24 65 0 
COPO CO 0 TLBWR 
010000 1 000 0000 0000 0000 0000 000110 
6 1 19 6 
Format: 
TLBWR 
Description: 


The TLB entry pointed at by the Random register is loaded with the contents of 
the EntryHi and EntryLo registers. The G bit of the TLB is written with the logical 
AND of the G bits in the EntryLo0 and EntryLo]/ registers. 


Operation: 


32,64 T: TLB[Randoms, 9] < 
PageMask || (EntryHi and not PageMask) || EntryLo1 || EntryLo0 


Exceptions: 


Coprocessor unusable exception 
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TLT Trap If Less Than TLT 


31 26 25 21 20 16 15 6 5 0 
SPECIAL rs rt code TLT 
000000 110010 
6 5 5 10 6 
Format: 
TLT rs, rt 
Description: 


The contents of general purpose register rt are compared with general purpose 
register rs. Assuming both values are signed integers, if the contents of general 
purpose register rs are less than the contents of general purpose register rt, a trap 
exception occurs. 


A parameter can be sent to the exception handler by using the code area. If the 
exception handler uses this parameter, the contents of the memory word including 
the instruction must be loaded as data. 


Operation: 


32,64 T: if GPR[rs] < GPR[rt] then 
TrapException 
endif 


Exceptions: 


Trap exception 
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TLTI Trap If Less Than Immediate TLTl 


31 26 25 21 20 16 15 0 
REGIMM rs TLTI immediate 
000001 01010 
6 5 5 16 
Format: 


TLTI rs, immediate 


Description: 


The 16-bit immediate is sign-extended and compared with the contents of general 
purpose register rs. Assuming both values are signed integers, if the contents of 
general purpose register rs are less than the sign-extended immediate, a trap 
exception occurs. 


Operation: 
32 T: if GPR[rs] < (immediate,5)'® || immediate;5 9 then 
TrapException 
endif 
64 T: if GPR[rs] < (immediate,5)*° || immediate;5 ¢ then 
TrapException 
endif 
Exceptions: 


Trap exception 
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TLTI U Trap If Less Than Immediate Unsigned TLTI U 


31 26 25 21 20 16 15 
REGIMM rs TLTIU immediate 
000001 01011 
6 5 5 16 
Format: 


TLTIU rs, immediate 


Description: 


The 16-bit immediate is sign-extended and compared with the contents of general 
purpose register rs. Assuming both values are unsigned integers, if the contents 
of general purpose register rs are less than the sign-extended immediate, a trap 
exception occurs. 


Operation: 
a2 T: if (0 || GPR[rs]) < (0 || (immediate,5)'® || immediate;s5 0) then 
TrapException 
endif 
64 T: if (0 || GPR[rs]) < (0 || (immediate,5)*° || immediate; 9) then 
TrapException 
endif 


Exceptions: 


Trap exception 
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Trap If Less Than Unsigned TLTU 


31 26 25 21 20 16 15 6 5 0 
SPECIAL rs rt code TLTU 
000000 110011 
6 5 5 10 6 

Format: 
TLTU rs, rt 
Description: 


The contents of general purpose register 7t are compared with general purpose 
register rs. Assuming both values are unsigned integers, if the contents of general 
purpose register rs are less than the contents of general purpose register rt, a trap 
exception occurs. 


A parameter can be sent to the exception handler by using the code area. If the 
exception handler uses this parameter, the contents of the memory word including 
the instruction must be loaded as data. 


Operation: 


32,64T: if (0 || GPRIrs]) < (0 || GPR[rt]) then 


TrapException 
endif 


Exceptions: 


Trap exception 
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TN E Trap If Not Equal TN E 


31 26 25 21 20 16 15 6 5 0 
SPECIAL rs rt code TNE 
000000 110110 
6 5 5 10 6 


Format: 
TNE rs, rt 


Description: 


The contents of general purpose register rt are compared with general purpose 
register rs. If the contents of general purpose register rs are not equal to the 
contents of general purpose register rt, a trap exception occurs. 


A parameter can be sent to the exception handler by using the code area. If the 
exception handler uses this parameter, the contents of the memory word including 
the instruction must be loaded as data. 


Operation: 


32,64T: if GPR[rs] « GPR[rt] then 
TrapException 
endif 


Exceptions: 


Trap exception 
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T™N E| Trap If Not Equal Immediate TN El 


31 26 25 21 20 16 15 0 
REGIMM rs TNEI immediate 
000001 01110 
6 5 5 16 
Format: 


TNEI rs, immediate 


Description: 


The 16-bit immediate is sign-extended and compared with the contents of general 
purpose register rs. If the contents of general purpose register rs are not equal to 
the sign-extended immediate, a trap exception occurs. 


Operation: 
32 T: if GPR{[rs] « (immediate,5)'® || immediate;5 ¢ then 
TrapException 
endif 
64 T: if GPR[rs] « (immediate,5)4° || immediate;5 9 then 
TrapException 
endif 
Exceptions: 


Trap exception 
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XOR Exclusive Or XOR 


31 26 25 21 20 16 15 11 10 6 5 0 
SPECIAL rs rt rd 0 XOR 
000000 00000 100110 
6 5 5 5 5 6 
Format: 
XOR rd, rs, rt 
Description: 


The contents of general purpose register rs and the contents of general purpose 
register rt are logical exclusive ORed bit-wise. The result is stored into general 
purpose register rd. 


Operation: 


32, 64 T: GPR[rd] — GPR[rs] xor GPR[rt] 


Exceptions: 


None 
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XORI Exclusive Or Immediate XORI 


31 26 25 21 20 16 15 0 


XORI rs rt immediate 
001110 


6 5 5 16 


Format: 
XORI rt, rs, immediate 


Description: 


The 16-bit zero-extended immediate and the contents of general purpose register 
rs are logical exclusive ORed bit-wise. 


The result is stored in general purpose register rt. 


Operation: 


32 7:  GPR[rt] — GPR[rs] xor (0'° || immediate) 
64 7:  GPRI[rt] <— GPR[rs] xor (078 || immediate) 


Exceptions: 


None 
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16.7 CPU Instruction Opcode Bit Encoding 


Figure 16-1 lists the Vp4300 Opcode Bit Encoding. 


28...26 Opcode 
31...29 0 1 2 3 4 5 ) FA 
0 | SPECIAL) REGIMM J JAL BEQ BNE BLEZ BGTZ 
1 ADDI ADDIU SLTI SLTIU ANDI ORI XORI LUI 
2 COPO COP1 COP2 * BEQL BNEL | BLEZL | BGTZL 
3 DADDle [DADDIUVe| LDLe LDRe * * * * 
4 LB LH LWL LW LBU LHU LWR LWUe 
5 SB SH SWL SW SDLe SDRe SWR_ |CACHE 8 
6 LL LWC1 LWC2 * LLDe LDC1 LDC2 LDe 
7 SC SWC1 SWC2 * SCDe SDC1 SDC2 SDe 
2...0 SPECIAL function 
5...3 0 1 2 3 4 6 7 
0 SLL * SRL SRA SLLV * SRLV SRAV 
1 JR JALR * * SYSCALL} BREAK * SYNC 
2 MFHI MTHI MFLO MTLO | DSLLVe * DSRLVe | DSRAVe 
3 MULT | MULTU DIV DIVU | DMULTe |DMULTUe] DDIVe | DDIVUe 
4 ADD ADDU SUB SUBU AND OR XOR NOR 
5 * * SLT SLTU DADDe | DADDUe! DSUBe | DSUBUe 
6 TGE TGEU TLT TLTU TEQ * TNE * 
7 DSLLe * DSRLe | DSRAe | DSLL32¢e * DSRL32e | DSRA32e 
18...16 REGIMM rt 
20...19 0 1 2 3 4 5 6 re 
0 BLTZ BGEZ | BLTZL | BGEZL * * * * 
1 TGEI TGEIU TLTI TLTIU TEQI * TNEI x 
2 BLTZAL | BGEZAL | BLTZALL | BGEZALL xk k ‘x 
3 ok ok ok ok ok ok ok ok 
23...21 COPz rs 
25...24 0 1 2 3 4 5 6 7 
0 MF DMFe CF Y MT DMTe CT Y 
1 BC Y Y i Y Y Y 
2 ere) 
3 


Figure 16-1 Vp4300 Opcode Bit Encoding (1/2) 
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18...16 COPz rt 

20...19 0 1 2 3 4 5 6 7 
BC BCT BCFL BCTL Y Y Y Y 
Y Y Y Y ¥ Y Y ¥ 
Y ¥ Y Y Y Y Y Y 
Y Y Y Y if Y Y Y 

oh CPO Function 
5.30 1 2 3 4 5 6 7 
0 > TLBR TLBWI “i “ “i TLBWR ‘) 
1 | TLBP > ) ) > > > ) 
2 § o b o o o o 
3 | ERET x > > > > > > > 
0 % > > % > % % > 
1 > Y % % > % % > 
2 > > > > > % % > 
3 > > > > > > > > 

Figure 16-1 Vp4300 Opcode Bit Encoding (2/2) 

Key: 

If the operation code marked with an asterisk is executed with the 
current Vp4300, the reserved instruction exception occurs. This 
code is reserved for future expansion. 

y Operation codes marked with a gamma cause a reserved 
instruction exception. They are reserved for future expansion. 

a) Operation codes marked with a delta are valid only for Vp4000 
processors with CPO enabled, and cause a reserved instruction 
exception on other processors. 

(0) Operation codes marked with a phi are invalid but do not cause 
reserved instruction exceptions in Vp4300 operation. 

§ Operation codes marked with a xi cause a reserved instruction 
exception on only Vp4300 processors. 

xX Operation codes marked with a chi are valid only on Vp4000 
series processors. 

E The operation code marked with an epsilon is valid in the 64-bit 


mode and 32-bit Kernel mode. In the 32-bit User or Supervisor 
mode, this code generates the reserved instruction exception. 
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17 


This chapter provides a detailed description of each floating-point unit (FPU) 
instruction in alphabetical order. 
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17.1 Instruction Formats 


There are three basic instruction format types: 


e I-Type, or Immediate format, which include load and store 
instructions 


e R-Type, or Register format, which include the two- and three- 
register floating-point instructions 


e¢ Other, which includes Branch, and Transfer to and from 
instructions 


The instruction description subsections that follow show how these three basic 
instruction formats are used by: 

¢ Load and store instructions 

¢ Transfer instructions 

e Floating-Point arithmetic instructions 

e Floating-Point branch instructions 


Floating-point instructions are mapped onto the MIPS coprocessor instructions, 
defining coprocessor unit number one (CP 1) as the floating-point unit. 


Each operation is valid only for certain formats. Implementations may support 
some of these formats and operations through emulation, but they only need to 
support combinations that are valid (marked V in Table 17-1). Combinations 
marked R in Figure 17-1 are not currently specified by this architecture, and cause 
an unimplemented instruction exception. They will be available for future 
extensions of the architecture. 
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Operation 


Source Format 


Single 


Double 


Word 


Longword 


ADD 


Vv 


Vv 


R 


R 


SUB 


MUL 


DIV 


SQRT 


ABS 


AA|A|AA 


AA|A|AA 


MOV 


NEG 


TRUNC.L 


ROUND.L 


CEIL.L 


FLOOR.L 


TRUNC.W 


ROUND.W 


CEIL.W 


FLOOR.W 


<<} <| <<] <| <| <<] <| <] <| <| <| <| <| <| < 


CVT.S 


<j <<] <j <<] <] <| <| <] <| <] <| <] <] <| </ < 


CVT.D 


CVT.W 


CVT.L 


C 


<|<| <| < 
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The FPU branch instruction can be used with the logic of the condition reversed. 
To compare all the 32 conditions, therefore, comparison need only be performed 
16 times, as shown in Table 17-2. 


Table 17-2. Logical Reverse of Predicates by Condition True/False 


Condition Relations Invalid 
‘ ti 
eee Gide: | ee bes Equal | Unordered Exception If 
True False Than Than ‘nordered 
F T 0) F F F F No 
UN OR 1 F F F T No 
EQ NEQ 2 F F T F No 
UEQ OGL 3 F F T T No 
OLT UGE 4 F T F F No 
ULT OGE 5 F T F T No 
OLE UGT 6 F T T F No 
ULE OGT 7 F T T T 
SF ST 8 F F F F 
NGLE GLE 9 F F F T 
SEQ SNE 10 F F T F 
NGL GL 11 F F T T 
LT NLT 12 F T F F 
NGE GE 13 F T F T 
LE NLE 14 F T T F 
NGT GT 15 F T T T 


Remark F: False 
T: True 
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Floating-Point Loads, Stores, and Transfers 


All movement of data between the floating-point unit (FPU) and memory is 
accomplished by unit load and store instructions, which reference the floating- 
point unit General Purpose registers. These instructions are unformatted; no 
format conversions are performed and, therefore, no floating-point exceptions can 
occur due to these instructions. 


Data may also be directly moved between the floating-point unit and the processor 
by move to coprocessor (MTC) and move from coprocessor (MFC) instructions. 

Like the floating-point load and store instructions, these instructions perform no 
format conversions and never cause floating-point exceptions. 


In addition, two floating-point control registers can be used as the FPU registers. 
These registers can support only the CTC1 and CFC1 instructions. 


Floating-Point Operations 


The floating-point unit instruction set includes: 
¢ floating-point add 
e — floating-point subtract 
e floating-point multiply 
e floating-point divide 
e floating-point square root 
* convert between fixed-point and floating-point formats 
* convert between floating-point formats 
¢ floating-point compare 


These operations satisfy the requirements of IEEE Standard 754 requirements for 
accuracy. Specifically, these operations obtain a result which is identical to an 
infinite-precision result rounded to the specified format, using the current 
rounding mode. 


Instructions must specify the format of their operands. Except for conversion 
functions, mixed-format operations cannot be performed. 
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17.2 Instruction Notation Conventions 


In this chapter, all variable subfields in an instruction format (such as fs, fz, 
immediate, and so on) are shown in lowercase. Instruction names (such as ADD, 
SUB, and so on) are shown in uppercase. 


For the sake of clarity, we sometimes use an alias for a variable subfield in the 
formats of specific instructions. For example, we use rs = base in the format for 
load and store instructions. Such an alias is always lowercase, since it refers to a 
variable subfield. 


In some instructions, the instruction subfields op and function have fixed 6-bit 
values. These instructions use uppercase mnemonic. For instance, in the floating- 
point ADD instruction we use op = COP! and function = FADD. In other cases, 
a single field has both fixed and variable subfields, so the name contains both 
uppercase and lowercase characters. The actual code of all the mnemonics and 
the codes in the function fields are indicated in 17.6 FPU Instruction Opcode Bit 
Encoding. The operation executed by each instruction by using representation in 
a high-level language is explained in the description of the operation of each 
instruction. For the meanings of the special symbols in the description, refer to 
Table 16-1 CPU Instruction Operation Notations. 


Instruction Notation Examples 


552 


The following examples illustrate the application of some of the instruction 
notations: 


Example #1: 
GPR{rt] <— immediate || 016 


Sixteen zero bits are concatenated with a low-order immediate value (typically 
16 bits), and the 32-bit string is assigned to General Purpose Register rt. 


Example #2: 
(immediate,5) '® || immediate;5 0 


Bit 15 (the sign bit) of an immediate value is extended for 16 bit positions, and 
the result is concatenated with bits 15 through 0 of the immediate value to 
form a 32-bit sign-extended value. 
Example #3: 

CPR[1, ft] — data 


Data is assigned to general purpose register ft of CP1, in other words Float- 
ing-Point General Purpose register FGR. 


User’s Manual U10504EJ7VOUM00 


FPU Instruction Set Details 


17.3 Load and Store Instructions 


In the Vp4300 implementation, the instruction immediately following a load may 
use the contents of the register being loaded. In such cases, the hardware 

interlocks, by the number of cycles required for reading, so scheduling load delay 
slots is still desirable, although not required for functional code when performance 
is regarded as the most significant factor, or compatibility with the Vp3000 series 


is required. 


The operation of the load and store instructions is dependent on the width of the 


FGRs. 


When the FR bit in the Status register equals zero, the Floating-Point 
general purpose registers (FGRs) are 32-bits wide. 

To retain single-precision floating-point format data, sixteen even 
number registers out of thirty-two FGRs can be accessed. 

To retain double-precision floating-point format data, even number 
registers are used for low-order bits of data, and odd number registers 
for high-order bits. 

The registers are used as even-odd pairs, and can retain sixteen 
double-precision format data. 


When the FR bit in the Status register equals one, the Floating-Point 
general purpose registers (FGRs) are 64-bits wide. 

To retain single-precision floating-point format data, low-order bits of 
thirty-two FGRs are used. 

To retain double-precision floating-point format data, thirty-two 
FGRs are used. 


In the load and store operation descriptions, the functions listed in 
Table 17-3 are used to summarize the handling of virtual addresses and physical 


memory. 
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Table 17-3 Load and Store Instructions Common Functions 


Function Meaning 


Uses the TLB to find the physical address given by the virtual 
AddressTranslation address. The function fails and a TLB miss exception occurs if 
the required translation is not present in the TLB. 


Searches cache and main memory to find contents of specified 
physical address at specified data length (doubleword or word), 
and loads contents. If cache is enabled, contents are loaded to 
cache. 


LoadMemory 


Searches and stores cache, write buffer, and main memory to 
StoreMemory store contents of specified physical address at specified data 
length (doubleword or word). 


Figure 17-1 shows the I-Type instruction format used by load and store 
instructions. 


I-Type (Immediate) 


31 26 25 21 20 16 15 0 


offset 


6 5 5 16 
op is a 6-bit opcode 
base _ is the 5-bit base register specifier 


ft is a 5-bit source (for stores) or destination (for loads) FPU register specifier 
offset is the 16-bit signed immediate offset 


Figure 17-1 Load and Store Instruction Format 


All coprocessor loads and stores reference data which is located at the word 
boundary. Thus, for word loads and stores, the access type field is always WORD, 
and the low-order two bits of the address must always be zero. For doubleword 
loads and stores, the access type field is always DOUBLEWORD, and the low- 
order three bits of the address must always be zero. 


Regardless of byte-numbering order (endianness), the address specifies that byte 
which has the smallest byte-address in the accessed field. For a big-endian 
system, this is the leftmost byte; for a little-endian system, this is the rightmost 
byte. 
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17.4 Floating-Point Computational Instructions 


Computational instructions include all of the floating-point computational 
operations performed by the FPU. 


Figure 17-2 shows the R-Type instruction format used for computational 


operations. 


R-Type (Register) 


31 26 25 21 20 16 15 1110 6 5 ) 
COP1 fmt ft fs fd function 
6 5 5 5 5 6 

COP1 is a 6-bit opcode 

fmt is a 5-bit format specifier 

fs is a 5-bit source register 

ft is a 5-bit source2 register 

fd is a 5-bit destination register 


function is a 6-bit function field 


Figure 17-2. Computational Instruction Format 


The function field indicates the floating-point operation to be performed. 


Each floating-point instruction can be applied to a number of operand formats. 
The operand format for an instruction is specified by the 5-bit format field (fmt); 
decoding for this field is shown in Table 17-4. 


Table 17-4 Format Field Decoding 


Code Mnemonic Size Format 
16 S Single (32 bits) | Binary floating-point 
17 D Double (64 bits) | Binary floating-point 
18 Reserved 
19 Reserved 
20 Ww 32 bits Binary fixed-point 
21 L 64 bits Binary fixed-point 

22-31 Reserved 


Table 17-5 lists all floating-point computational instructions. 
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Table 17-5 Floating-Point Computational Instructions and Operations 


ao Mnemonic Operation 
0 ADD Add 
1 SUB Subtract 
2 MUL Multiply 
3 DIV Divide 
4 SQRT Square root 
3 ABS Absolute value 
6 MOV Transfer 
7 NEG Sign reverse 
8 ROUND.L Convert to 64-bit fixed-point, rounded to nearest/even 
9 TRUNC.L Convert to 64-bit fixed-point, rounded toward zero 
10 CEIL.L Convert to 64-bit fixed-point, rounded to + 
11 FLOOR.L Convert to 64-bit fixed-point, rounded to — 0% 
12 ROUND.W | Convert to 32-bit fixed-point, rounded to nearest/even 
13 TRUNC.W Convert to 32-bit fixed-point, rounded toward zero 
14 CEIL.W Convert to 32-bit fixed-point, rounded to + 
15 FLOOR.W Convert to 32-bit fixed-point, rounded to — 
16-31 - Reserved 
32 CVT.S Convert to single floating-point 
33 CVT.D Convert to double floating-point 
34 - Reserved 
35 - Reserved 
36 CVT.W Convert to 32-bit fixed-point 
37 CVT.L Convert to 64-bit fixed-point 
38-47 | - Reserved 
48-63 |C Floating-point compare 
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In the following pages, the notation FGR means the 32 FPU General Purpose 
registers FGRO through FGR3/ of the FPU, and FPR refers to the floating-point 
registers of the FPU. 


An FGR (for some parts, CPR is described instead) is used for the load/store 
instructions, and the data transfer instruction to/from the CPU. FPR is used for 


the transfer instruction, arithmetic instruction, and conversion instruction in the 
CP1. 


¢ When the FR bit in the Status register (26 bit) equals zero, only the 
even floating-point registers are valid and the 32 FPUs are 32-bit 
wide. 


¢ When the FR bit in the Status register (26 bit) equals one, both odd 
and even FPRs can be used and the 32 FPUs are 64-bit wide. 


The following routines are used in the description of the floating-point operations 
to retrieve the value of an FPR or to change the value of an FGR: 


32 Bit Mode 


value <-- ValueFPR(fpr, fmt) 
/* undefined for odd fpr */ 
case fmt of 
S, W: 
value <-- FGR[fpr+0] 
D: 
value <-- FGR[fpr+1] || FGR[fpr+0] 
end 


StoreFPR(fpr, fmt, value): 
/* undefined for odd fpr */ 
case fmt of 
S, W: 
FGR[fpr+1] <-- undefined 
FGR[fpr+0] <-- value 


FGR[fpr+1] <-- valuegs_ 30 
FGR[fpr+0] — value3; 6 
end 
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64 Bit Mode 


value <-- ValueFPR(fpr, fmt) 
case fmt of 
S, W: 
value <-- FGR[fpr]31..0 
D,L: 
value <-- FGR[fpr] 
end 


StoreFPR(fpr, fmt, value): 
case fmt of 
S, W: 
FGRI[fpr] <-- undefined® || value 
D,L: 
FGR[fpr] <-- value 
end 


17.5 FPU Instructions 
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This section describes in detail the floating-point (FPU) instructions. 


The exceptions that may occur as a result of executing each instruction are 
described at the end of the description of each instruction. For the details of the 
exceptions and exception processing, refer to Chapter 8 Floating-Point 
Exceptions. 


User's Manual U10504EJ7VOUM00 


FPU Instruction Set Details 


ABS.fmt Absolute Value ABS.fmt 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd ABS 
010001 00000 000101 
6 5 5 5 5 6 

Format: 
ABS.fmt fd, fs 

Description: 
The absolute value of the contents of floating-point register fs is taken and the 
value to floating-point register fd is stored. The operand is processed in the 
floating-point format fmt. 
The absolute value operation is arithmetically performed. If the operand is NaN, 
therefore, the invalid operation exception occurs. 
This instruction is valid only in the single- and double-precision floating-point 
formats. 
If the FR bit of the Status register is 0, only an even number can be specified as a 
register number because adjacent even-numbered and odd-numbered registers are 
used in pairs as a floating-point registers. If an odd number is specified, the 
operation is undefined. 
If the FR bit of the Status bit is 1, both the odd and even register numbers are valid. 

Operation: 

32,64 ~=*iT: StoreFPR (fd, fmt, AbsoluteValue (ValueFPR (fs, fmt) ) ) 

Exceptions: 


Coprocessor unusable exception 
Floating-point exception 


Floating-Point Exceptions: 


Unimplemented operation exception 
Invalid operation exception 
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ADD.fmt — Fleating-pointads ~=ADD.fmt 


31 


26 25 21 20 16 15 11 10 6 5 0 


COP1 
010001 


fmt ft fs fd ADD 
000000 


6 


5 5 5 5 6 


Format: 


ADD.fmt fd, fs, ft 


Description: 


The contents of floating-point registers fs and ft are added, and stores the result is 
stored to floating-point register fd. The operand is processed in the floating-point 
format fmt. The operation is executed as if the accuracy were infinite, and the 
result is rounded according to the current rounding mode. 


This instruction is valid only in the single- and double-precision floating-point 
formats. 


If the FR bit of the status register is 0, only an even number can be specified as a 
register number because adjacent even-numbered and odd-numbered registers are 
used in pairs as a floating-point registers. If an odd number is specified, the 
operation is undefined. If the F'R bit of the Status bit is 1, both the odd and even 
register numbers are valid. 


Operation: 


32, 64 


T: StoreFPR (fd, fmt, ValueFPR (fs, fmt) + ValueFPR (ft, fmt) ) 


Exceptions: 


Coprocessor unusable exception 
Floating-point exception 


Floating-Point Exceptions: 
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Unimplemented operation exception 
Invalid operation exception 

Inexact operation exception 
Overflow exception 

Underflow exception 
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Branch On FPU False 
(Coprocessor 1) 


BC1F 


31 26 25 21 20 1615 0 
COP1 BC BCF offset 
010001 01000 00000 
6 5 5 16 
Format: 
BCIF offset 
Description: 
A branch target address is computed from the sum of the address of the instruction 
in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. If 
the CPz condition signal sampled while the instruction immediately preceding is 
being executed is false (0), the program branches to the branch target address, with 
a delay of one instruction. 
Because the result of comparison is sampled while the instruction immediately 
preceding is executed, at least one instruction must be inserted in between the 
floating-point compare instruction and this instruction. 
Operation: 
32 T-1 condition — not COC[1] 
T: target < (offset,;)'* || offset || 0° 
T+1: — if condition then 
PC < PC + target 
endif 
64 T-1 condition — not COC[1] 
T: target < (offset,;)“° || offset || 07 
T+1: — if condition then 
PC < PC + target 
endif 
Exceptions: 


Coprocessor unusable exception 
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BC1FL 


(Coprocessor 1) 


Branch On FPU False Likely 


BC1FL 


31 26 25 21 20 16 15 0 
COP1 BC BCF offset 
010001 01000 00010 
6 5 5 16 
Format: 
BCIFL offset 
Description: 
A branch target address is computed from the sum of the address of the instruction 
in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. If 
the CPz condition signal sampled while the instruction immediately preceding is 
being executed is false (0), the program branches to the branch target address, with 
a delay of one instruction. If the branch is not taken, the instruction in the branch 
delay slot is nullified. 
Because the result of comparison is sampled while the instruction immediately 
preceding is executed, at least one instruction must be inserted in between the 
floating-point compare instruction and this instruction. 
Operation: 
32 T-1: condition — not COC[1] 
A target < (offset,;)'4 || offset || 07 
T+1: if condition then 
PC < PC + target 
else 
NullifyCurrentinstruction 
endif 
64 T-1 condition — not COC[1] 
T: target < (offset,;)*6 || offset || 07 
T+1: — if condition then 
PC < PC + target 
else 
NullifyCurrentinstruction 
endif 
Exceptions: 
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Coprocessor unusable exception 
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BC1IT 


Branch On FPU True 


(Coprocessor 1) 


FPU Instruction Set Details 


BC1IT 


Coprocessor unusable exception 
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31 26 25 21 20 1615 0 
COP1 BC BCT offset 
010001 01000 00001 
6 5 5 16 
Format: 
BCIT offset 
Description: 
A branch target address is computed from the sum of the address of the instruction 
in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. If 
the CPz condition signal sampled while the instruction immediately preceding is 
being executed is true (1), the program branches to the branch target address, with 
a delay of one instruction. 
Because the result of comparison is sampled while the instruction immediately 
preceding is executed, at least one instruction must be inserted in between the 
floating-point compare instruction and this instruction. 
Operation: 
32 T-1 condition — COC[1] 
Ie target < (offset,s)'* || offset || 07 
T+1: — if condition then 
PC < PC + target 
endif 
64 T-1 condition — COC[1] 
Te target < (offset,s)° || offset || 0° 
T+1: — if condition then 
PC < PC + target 
endif 
Exceptions: 
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B h FPU True Likel 
BC1 TL OI conidcsscoet) a BC1 TL 


31 26 25 21 20 1615 0 
COP1 BC BCTL offset 
010001 01000 00011 
6 5 5 16 
Format: 
BCITL offset 
Description: 


A branch target address is computed from the sum of the address of the instruction 
in the delay slot and the 16-bit offset, shifted left two bits and sign-extended. If 

the result of the last floating-point compare is true (1), the program branches to the 
branch target address, with a delay of one instruction. If the branch is not taken, 
the instruction in the branch delay slot is nullified. 


Because the result of comparison is sampled while the instruction immediately 
preceding is executed, at least one instruction must be inserted in between the 
floating-point compare instruction and this instruction. 


Operation: 
32 T-1: condition — COC[1] 
i target < (offset,s)'* || offset || 0° 


T+1: — if condition then 
PC < PC + target 


else 
NullifyCurrentinstruction 
endif 
64 T-1: condition — COC[1] 
a target < (offset,s)“© || offset || 0° 


T+1: — if condition then 
PC < PC + target 
else 
NullifyCurrentinstruction 
endif 


Exceptions: 


Coprocessor unusable exception 
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FPU Instruction Set Details 


Floating-point 
C.cond.fmt Compare C.cond.fmt 
31 26 25 21 20 16 15 11 10 65 43 0 
COP1 fmt ft fs 0 FC* | cond* 
010001 00000 | 11 
6 5 5 5 5 2 4 
Format: 
C.cond.fmt fs, ft 

Description: 


Compares the contents of floating-point register fs with those of floating-point 
register ft based on compare condition cond, and sets the result to condition signal 
COC [1]. The operand is processed in the floating-point format fmt. If one of the 
values is NaN and if the most-significant bit of compare condition cond is set, the 
invalid operation exception occurs (the result of the comparison is used to test the 
FPU branch instruction). At least one instruction is necessary between this 
instruction and the FPU branch instruction. 


Comparison is performed normally, and does not overflow or underflow. One of 
four mutually exclusive relations results, “less than’, “equal to”, “greater than’, 
or “cannot be compared”, occurs. If one of or both the operands are NaN, the 


result of the comparison is always “cannot be compared”. 
During comparison, the sign of 0 is ignored (+0 = -0). 


This instruction is valid only in the single- and double-precision floating-point 
format. 


If the FR bit of the status register is 0, only an even number can be specified as a 
register number because adjacent even-numbered and odd-numbered registers are 
used in pairs as a floating-point registers. If an odd number is specified, the 
operation is undefined. If the FR bit of the status bit is 1, both the odd and even 
register numbers are valid. 


* See 17.6 FPU Instruction Opcode Bit Encoding. 
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C.cond.fmt = "Egnebent —C.cond.fmt 
(continued) 


Operation: 


32,64 iT: if NaN (ValueFPR (fs, fmt) ) or NaN (ValueFPR (ft, fmt) ) then 
less < false 
equal < false 
unordered <— true 
if condg then 
signal InvalidOperationException 


endif 

else 
less — ValueFPR (fs, fmt) < ValueFPR (ft, fmt) 
equal <- ValueFPR (fs, fmt) = ValueFPR (ft, fmt) 
unordered < false 

endif 


condition < (conds and less) or (cond; and equal) or 
(cond and unordered) 

FCR[31]o3 < condition 

COC[1] — condition 


Exceptions: 


Coprocessor unusable 
Floating-point exception 


Floating-Point Exceptions: 


Unimplemented operation exception 
Invalid operation exception 
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FPU Instruction Set Details 


CEIL.L.fmt — cetingfetong - CEIL.L.fmt 


Fixed-point Format 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd CEIL.L 
010001 00000 001010 
6 5 5 5 5 6 

Format: 
CEIL.L.fmt fd, fs 
Description: 


The contents of floating-point register fs are arithmetically converted into a 64-bit 
fixed-point format, and the result is stored to floating-point register fd. The source 
operand is processed in the floating-point format fmt. 


The result of the conversion is rounded toward the + © direction, regardless of the 
current rounding mode. 


This instruction is valid only for conversion from the single- or double-precision 
floating-point format. 


If the FR bit of the Status register is 0, only an even number can be specified as a 
register number because adjacent even-numbered and odd-numbered registers are 
used in pairs as a floating-point registers. If an odd number is specified, the 
operation is undefined. If the FR bit of the Status register is 1, both the odd and 
even register numbers are valid. 


If the source operand is infinite or NaN, and if the rounded result is outside the 
range of 2°3 _1 to -263, the invalid operation exception occurs. If the invalid 
operation exception is not enabled, the exception does not occur, and Oo as 
returned. 


This operation is defined in the 64-bit mode and 32-bit Kernel mode. If this 
instruction is executed during 32-bit User/Supervisor mode, a reserved instruction 
exception occurs. 
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CEIL.L.fmt — coinag Song «= CEIL.L.fmt 
Fixed-point Format 
(continued) 


Operation: 


32,64 = =‘T: StoreFPR (fd, L, ConvertFmt (ValueFPR (fs, fmt) , fmt, L) ) 


Exceptions: 
Coprocessor unusable exception 
Floating-point exception 
Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
Floating-Point Exceptions: 
Invalid operation exception 
Unimplemented operation exception 
Inexact operation exception 
Overflow exception 
Restrictions: 
An unimplemented operation exception will occur in the following cases. 
e If an overflow occurs during conversion to integer format 
e If the source operand is an infinite number 


e If the source operand is NaN 


Essentially, if any of bits 53 to 62 of the result of conversion from a floating-point 
format to a fixed-point format is 1, an unimplemented operation exception will 
occur. This includes cases when there is an overflow during conversion. 
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FPU Instruction Set Details 


CEIL.W.fmt coining to'single CEIL.W.fmt 


Fixed-point Format 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd CEIL.W 
010001 00000 001110 
6 5 5 5 5 6 

Format: 
CEIL.W .fmt fd, fs 
Description: 


The contents of floating-point register fs are arithmetically converted into a 32-bit 
fixed-point format, and the result is stored to floating-point register fd. The source 
operand is processed in the floating-point format fmt. 


The result of the conversion is rounded toward the + direction, regardless of the 
current rounding mode. 


This instruction is valid only for conversion from the single- or double-precision 
floating-point format. 


If the FR bit of the Status register is 0, only an even number can be specified as a 
register number because adjacent even-numbered and odd-numbered registers are 
used in pairs as a floating-point registers. If an odd number is specified, the 
operation is undefined. If the FR bit of the Status register is 1, both the odd and 
even register numbers are valid. 


If the source operand is infinite or NaN, and if the rounded result is outside the 
range of 23! 1 to -23!, the invalid operation exception occurs. If the invalid 
operation exception is not enabled, the exception does not occur, and 23!_1 is 
returned. 
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Floating-point 
CEIL.W.fmt ceiling To Single CEIL.W.fmt 
Fixed-point Format 
(continued) 


Operation: 


32,64 = =‘T: StoreFPR (fd, W, ConvertFmt (ValueFPR (fs, fmt) , fmt, W) ) 


Exceptions: 
Coprocessor unusable exception 
Floating-point exception 
Floating-Point Exceptions: 
Invalid operation exception 
Unimplemented operation exception 
Inexact operation exception 
Overflow exception 
Restrictions: 
An unimplemented operation exception will occur in the following cases. 
e If an overflow occurs during conversion to integer format 
e If the source operand is an infinite number 


e If the source operand is NaN 


Essentially, if any of bits 53 to 62 of the result of conversion from a floating-point 
format to a fixed-point format is 1, an unimplemented operation exception will 
occur. This includes cases when there is an overflow during conversion. 
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CFCT 


FPU Instruction Set Details 


Move Control Word From FPU 
(Coprocessor 1) CFC1 


31 26 25 21 20 16 15 11 10 0 
COP1 CF rt fs 0 
010001 00010 000 0000 0000 
6 5 5 5 11 

Format: 
CFC1 rt, fs 

Description: 
The contents of the floating-point control register fs are loaded into general 
purpose register rt. 
This instruction is only defined when fs equals 0 or 31. 
The contents of general purpose register 7t are undefined while the instruction 
immediately following this load instruction is being executed. 

Operation: 

32 T: temp < FCRI[fs] 
T+1: GPR[rt] < temp 

64 T: temp < FCRifs] 
T+1: GPR{[rt] <— (temp3;)°* || temp 

Exceptions: 


Coprocessor unusable exception 
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Move Control Word To FPU 
CTC1 (Coprocessor 1) CTC1 


31 26 25 21 20 16 15 11 10 0 
COP1 Cr rt fs 0 
010001 00110 0000000 0000 
6 5 5 5 11 
Format: 
CTC1 rt, fs 
Description: 


The contents of general purpose register rt are loaded to floating-point register fs. 
This instruction is defined if fs is 0 or 31. 


If the cause bit of the floating-point control/status register (FCR31) and the 
corresponding enable bit are set by writing data to FCR31, the floating-point 
exception occurs. Write the data to the register before the exception occurs. 


The contents of the floating-point control register fs are undefined while the 
instruction immediately following this instruction is executed. 


Operation: 


32 T: temp < GPR[rt] 
T+1: FCR[fs] < temp 
COC[1] <— FCR[31]o3 
64 T: temp < GPRIrt]3;_ 0 
T+1: FCR[fs] < temp 
COC[1] — FCR[31]o3 


Exceptions: 


Coprocessor unusable exception 
Floating-point exception 


Floating-Point Exceptions: 


Invalid operation exception 
Unimplemented operation exception 
Division by zero exception 

Inexact operation exception 
Overflow exception 

Underflow exception 
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FPU Instruction Set Details 


CVT.D.fmt conc Soule CVT.D.fmt 


Floating-point Format 


31 


26 25 21 20 16 15 11 10 6 5 0 


COP1 
010001 


fmt 0 fs fd CVT.D 
00000 100001 


6 


5 5 5 5 6 


Format: 


CVT.D.fmt fd, fs 


Description: 


The contents of floating-point register fs are arithmetically converted into a 
double-precision floating-point format, and the result is stored to floating-point 
register fd. The source operand is processed in the floating-point format fmt. 


This instruction is valid only for conversion from the single-precision floating- 
point format, and 32-bit or 64-bit fixed floating-point format. 


In the single-precision floating-point format or 32-bit fixed point format, this 
conversion operation is executed correctly without the accuracy becoming 
degraded. 


If the FR bit of the Status register is 0, only an even number can be specified as a 
register number because adjacent even-numbered and odd-numbered registers are 
used in pairs as a floating-point registers. If an odd number is specified, the 
operation is undefined. If the FR bit of the Status register is 1, both the odd and 
even register numbers are valid. 


Operation: 


32, 64 


Te StoreFPR (fd, D, ConvertFmt (ValueFPR (fs, fmt) , fmt, D) ) 


Exceptions: 


Coprocessor unusable exception 
Floating-point exception 


Floating-Point Exceptions: 


Invalid operation exception 
Unimplemented operation exception 
Inexact operation exception 
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CVT.D.fmt — cenvat 2 Souble CVT.D.fmt 


Floating-point Format 
(continued) 


Restrictions: 
An unimplemented operation exception will occur in the following cases. 
e If an overflow occurs during conversion to integer format 
e If the source operand is an infinite number 


e If the source operand is NaN 


¢ Conversion from floating-point format to fixed-point format 


Essentially, if any of bits 53 to 62 of the result of conversion from a floating-point 
format to a fixed-point format is 1, an unimplemented operation exception will 
occur. This includes cases when there is an overflow during conversion. 


¢ Conversion from fixed-point format to floating-point format 


Essentially, if 64-bit fixed-point format data in which any of bits 55 to 62 is 1 is 
converted to floating-point format data, an unimplemented operation exception 
will occur. 
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CVT.L. 


FPU Instruction Set Details 


Floating-point 
fmt Convert To Long CVT.L.fmt 
Fixed-point Format 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd CVT.L 
010001 00000 100101 
6 5 5 5 5 6 
Format: 
CVT.L.fmt fd, fs 
Description: 


The contents of floating-point register fs are arithmetically converted into a 64-bit 
fixed-point format, and the result is stored to floating-point register fd. The source 
operand is processed in the floating-point format fmt. 


This instruction is valid only for conversion from the single- or double-precision 
floating-point format. 


If the FR bit of the Status register is 0, only an even number can be specified as a 
register number because adjacent even-numbered and odd-numbered registers are 
used in pairs as a floating-point registers. If an odd number is specified, the 
operation is undefined. If the FR bit of the Status register is 1, both the odd and 
even register numbers are valid. 


If the source operand is infinite or NaN, and if the rounded result is outside the 
range of 2°3 _1 to -2°9, the invalid operation exception occurs. If the invalid 
operation exception is not enabled, the exception does not occur, and oe 
returned. 


—| is 


This operation is defined in the 64-bit mode and 32-bit Kernel mode. If this 
instruction is executed during 32-bit User/Supervisor mode, a reserved instruction 
exception occurs. 
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Floating-poi 
CVT.L.fmt — convert'ToLong CVT.L.fmt 
Fixed-point Format 
(continued) 


Operation: 


64 T: StoreFPR (fd, L, ConvertFmt (ValueFPR (fs, fmt) , fmt, L) ) 


Remark Same operation in the 32-bit Kernel mode. 


Exceptions: 
Coprocessor unusable exception 
Floating-point exception 
Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
Floating-Point Exceptions: 
Invalid operation exception 
Unimplemented operation exception 
Inexact operation exception 
Overflow exception 
Restrictions: 
An unimplemented operation exception will occur in the following cases. 
e If an overflow occurs during conversion to integer format 
e If the source operand is an infinite number 


e If the source operand is NaN 


Essentially, if any of bits 53 to 62 of the result of conversion from a floating-point 
format to a fixed-point format is 1, an unimplemented operation exception will 
occur. This includes cases when there is an overflow during conversion. 
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FPU Instruction Set Details 


CVT.S.fmt convert Fe single CVT.S.fmt 


Floating-point Format 


31 


26 25 21 20 16 15 11 10 6 5 0 


COP1 
010001 


fmt 0 fs fd CVT.S 
00000 100000 


6 


5 5 5 5 6 


Format: 


CVT.S.fmt fd, fs 


Description: 


The contents of floating-point register fs are arithmetically converted into a 
single-precision floating-point format, and the result is stored to floating-point 
register fd. The source operand is processed in the floating-point format fmt. The 
result of the conversion is rounded according to the current rounding mode. 


This instruction is valid only for conversion from the double-precision floating- 
point format, and 32-bit or 64-bit fixed floating-point format. 


If the FR bit of the Status register is 0, only an even number can be specified as a 
register number because adjacent even-numbered and odd-numbered registers are 
used in pairs as a floating-point registers. If an odd number is specified, the 
operation is undefined. If the FR bit of the Status register is 1, both the odd and 
even register numbers are valid. 


Operation: 


32,64 T: StoreFPR (fd, S, ConvertFmt (ValueFPR (fs, fmt) , fmt, S) ) 


Exceptions: 


Coprocessor unusable exception 
Floating-point exception 


Floating-Point Exceptions: 


Invalid operation exception 
Unimplemented operation exception 
Inexact operation exception 
Overflow exception 

Underflow exception 
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Floating-point 
CVT.S.fmt convert to Single CVT.S.fmt 
Floating-point Format 
(continued) 


Restrictions: 
An unimplemented operation exception will occur in the following cases. 
e If an overflow occurs during conversion to integer format 
e If the source operand is an infinite number 


e If the source operand is NaN 


¢ Conversion from floating-point format to fixed-point format 


Essentially, if any of bits 53 to 62 of the result of conversion from a floating-point 
format to a fixed-point format is 1, an unimplemented operation exception will 
occur. This includes cases when there is an overflow during conversion. 


¢ Conversion from fixed-point format to floating-point format 


Essentially, if 64-bit fixed-point format data in which any of bits 55 to 62 is 1 is 
converted to floating-point format data, an unimplemented operation exception 
will occur. 
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FPU Instruction Set Details 


CVT.W.fmt comartesingie CVT.W.fmt 


Fixed-point Format 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd CVT.W 
010001 00000 100100 
6 5 5 5 5 6 

Format: 
CVT.W.fmt fd, fs 
Description: 


The contents of floating-point register fs are arithmetically converted into a 32-bit 
fixed-point format, and the result is stored to floating-point register fd. The source 
operand is processed in the floating-point format fmt. 


This instruction is valid only for conversion from the single- or double-precision 
floating-point format. 


If the FR bit of the Status register is 0, only an even number can be specified as a 
register number because adjacent even-numbered and odd-numbered registers are 
used in pairs as a floating-point registers. If an odd number is specified, the 
operation is undefined. If the FR bit of the Status register is 1, both the odd and 
even register numbers are valid. 


If the source operand is infinite or NaN, and if the rounded result is outside the 
range of 23! 1 to -23!, the invalid operation exception occurs. If the invalid 
operation exception is not enabled, the exception does not occur, and ois 
returned. 
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CVT.W.fmt comantesingie CVT.W.fmt 


Fixed-point Format 
(continued) 


Operation: 


32,64 = ‘iT: StoreFPR (fd, W, ConvertFmt (ValueFPR (fs, fmt) , fmt, W) ) 


Exceptions: 
Coprocessor unusable exception 
Floating-point exception 
Floating-Point Exceptions: 
Invalid operation exception 
Unimplemented operation exception 
Inexact operation exception 
Overflow exception 
Restrictions: 
An unimplemented operation exception will occur in the following cases. 
e If an overflow occurs during conversion to integer format 
e If the source operand is an infinite number 


e If the source operand is NaN 


Essentially, if any of bits 53 to 62 of the result of conversion from a floating-point 
format to a fixed-point format is 1, an unimplemented operation exception will 
occur. This includes cases when there is an overflow during conversion. 
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FPU Instruction Set Details 


DIV.fmt Floating-point Divide DIV.fmt 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt ft fs fd DIV 
010001 000011 
6 5 5 5 5 6 

Format: 
DIV.fmt fd, fs, ft 

Description: 
The contents of floating-point register fs are divided by those of floating-point 
register ft, and the result are stored to floating-point register rd. The operand is 
processed in the floating-point format fmt. The operation is executed as if the 
accuracy were infinite, and the result is rounded according to the current rounding 
mode. 
This instruction is valid only for conversion from the single- or double-precision 
floating-point format. 
If the FR bit of the Status register is 0, only an even number can be specified as a 
register number because adjacent even-numbered and odd-numbered registers are 
used in pairs as a floating-point registers. If an odd number is specified, the 
operation is undefined. If the FR bit of the Status register is 1, both the odd and 
even register numbers are valid. 

Operation: 

32,64 ~=~iT: StoreFPR (fd, fmt, ValueFPR (fs, fmt)/ValueFPR (ft, fmt) ) 

Exceptions: 


Coprocessor unusable exception 
Floating-point exception 


Floating-Point Exceptions: 


Unimplemented operation exception Invalid operation exception 
Division-by-zero exception Inexact operation exception 
Overflow exception Underflow exception 
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Doubleword Move From FPU 
DMFC1 (Coprocessor 1) DMFC1 


31 26 25 21 20 16 15 11 10 0 
COP1 DMF rt fs 0 
010001 00001 000 0000 0000 
6 5 5 5 11 


Format: 
DMFCI rt, fs 


Description: 


582 


The contents of Floating-Point General Purpose register fs are stored into CPU 
general purpose register rt. 


The contents of general purpose register 7t are undefined while the instruction 
immediately following this instruction is being executed. 


The FR bit of the Status register indicates whether all the 32 registers of the 
processor can be specified. If the FR bit is 0, and the least-significant bit of fs is 
1, this instruction is undefined. 


The operation is undefined if an odd number is specified when the FP bit of the 
status register is 0. If the FR bit is 1, both the odd-numbered and even-numbered 
registers are valid. 


This operation is defined in 64-bit mode or 32-bit Kernel mode. 
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FPU Instruction Set Details 


DMFC1 Doubleword Move From FPU DMFEC1 


(Coprocessor 1) 
(continued) 


Operation: 


64 T: if SRog=1 then 
data < FGR [fs] 
else 


if fS 9 =0 then 
data <— FGR [fs + 1] || FGR [fs] 
else 
data <— undefined®* 
endif 
T+1: GPRirt] < data 


Remark Same operation in the 32-bit Kernel mode. 


Exceptions: 


Coprocessor unusable exception 
Floating-point exception 
Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 


Floating-Point Exceptions: 


Unimplemented operation exception 


User’s Manual U10504EJ7VOUM0O 583 


Chapter 17 


Doubleword Move To FPU 
DMTC1 (Coprocessor 1) DMTC1 


31 26 25 21 20 16 15 11 10 0 
COP1 DMT rt fs 0 
010001 00101 000 0000 0000 
6 5 5 5 11 


Format: 
DMTCI rt, fs 


Description: 


584 


The contents of general purpose register rt are loaded into Floating-Point General 
Purpose register fs. 


The contents of fs are undefined while the instruction immediately following this 
instruction is being executed. 


The FR bit of the Status register indicates whether all the 32 registers of the 
processor can be specified. If the FR bit is 0, and the least-significant bit of fs is 
1, this instruction is undefined. 


The operation is undefined if an odd number is specified when the FR bit of the 
status register is 0. If the FR bit is 1, both the odd-numbered and even-numbered 
registers are valid. 


This operation is defined in 64-bit mode or 32-bit Kernel mode. 
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DMTC1 


Doubleword Move To FPU 


(Coprocessor 1) 
(continued) 


FPU Instruction Set Details 


DMTC1 


Operation: 
64 T: data <— GPR{[rt] 
T+1: if SRog = 1 then 
FGR [fs] < data 
else 
if fsg = 0 then 
FGR [fs+1] <— datag3. 30 
FGR [fs] — data31 0 
else 
undefined_result 
endif 
Remark Same operation in the 32-bit Kernel mode. 
Exceptions: 


Coprocessor unusable exception 


Floating-point exception 
Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 


Floating-Point Exceptions: 


Unimplemented operation exception 
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Chapter 17 


FLOOR.L.fmt Foortziong FLOOR.L.fmt 


Fixed-point Format 


586 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd FLOOR.L 
010001 00000 001011 
6 5 5 5 5 6 
Format: 
FLOOR.L.fmt fd, fs 
Description: 


The contents of floating-point register fs are arithmetically converted into a 64-bit 
fixed-point format, and the result is stored to floating-point register fd. The source 
operand is processed in the floating-point format fmt. 


The result of the conversion is rounded toward the — » direction, regardless of the 
current rounding mode. 


This instruction is valid only for conversion from the single- or double-precision 
floating-point format. 


If the FR bit of the Status register is 0, only an even number can be specified as a 
register number because adjacent even-numbered and odd-numbered registers are 
used in pairs as a floating-point registers. If an odd number is specified, the 
operation is undefined. If the FR bit of the Status register is 1, both the odd and 
even register numbers are valid. 


If the source operand is infinite or NaN, and if the rounded result is outside the 
range of 2°3 _1 to -263, the invalid operation exception occurs. If the invalid 
operation exception is not enabled, the exception does not occur, and Oo as 
returned. 


This operation is defined in the 64-bit mode and 32-bit Kernel mode. If this 
instruction is executed during 32-bit User/Supervisor mode, a reserved instruction 
exception occurs. 
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FLOOR.L.fmt foreren' FLOOR.L.fmt 


Floor To Long 
Fixed-point Format 
(continued) 


Operation: 


64 le StoreFPR (fd, L, ConvertFmt (ValueFPR (fs, fmt) , fmt, L) ) 


Remark Same operation in the 32-bit Kernel mode. 


Exceptions: 
Coprocessor unusable exception 
Floating-point exception 
Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
Floating-Point Exceptions: 
Invalid operation exception 
Unimplemented operation exception 
Inexact operation exception 
Overflow exception 
Restrictions: 
An unimplemented operation exception will occur in the following cases. 
e If an overflow occurs during conversion to integer format 
e If the source operand is an infinite number 


e If the source operand is NaN 


Essentially, if any of bits 53 to 62 of the result of conversion from a floating-point 
format to a fixed-point format is 1, an unimplemented operation exception will 
occur. This includes cases when there is an overflow during conversion. 
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FLOOR.W.fmt 


Floating-point 
Floor To Single 


Fixed-point Format 


FLOOR.W.fmt 


31 26 25 21 20 16 15 11.10 6 5 
COP1 fmt 0 fs fd FLOOR.W 
010001 00000 001111 
6 5 5 5 5 6 
Format: 
FLOOR.W.fmt fd, fs 
Description: 


588 


The contents of floating-point register fs are arithmetically converted into a 32-bit 
fixed-point format, and the result is stored to floating-point register fd. The source 


operand is processed in the floating-point format fmt. 


The result of the conversion is rounded toward the — © direction, regardless of the 


current rounding mode. 


This instruction is valid only for conversion from the single- or double-precision 


floating-point format. 


If the FR bit of the Status register is 0, only an even number can be specified as a 
register number because adjacent even-numbered and odd-numbered registers are 


used in pairs as a floating-point registers. If an odd number is specified, the 


operation is undefined. If the FR bit of the Status register is 1, both the odd and 


even register numbers are valid. 


If the source operand is infinite or NaN, and if the rounded result is outside the 


range of 23! _1 to -23!, the invalid operation exception occurs. If the invalid 


operation exception is not enabled, the exception does not occur, and oP ltig 


returned. 
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FLOOR.W.fmt jo2'rero"!, FLOOR.W.fmt 
Fixed-point Format 
(continued) 


Operation: 


32,64 T: StoreFPR (fd, W, ConvertFmt (ValueFPR (fs, fmt) , fmt, W) ) 


Exceptions: 
Coprocessor unusable exception 
Floating-point exception 
Floating-Point Exceptions: 
Invalid operation exception 
Unimplemented operation exception 
Inexact operation exception 
Overflow exception 
Restrictions: 
An unimplemented operation exception will occur in the following cases. 
e If an overflow occurs during conversion to integer format 
e If the source operand is an infinite number 


e If the source operand is NaN 


Essentially, if any of bits 53 to 62 of the result of conversion from a floating-point 
format to a fixed-point format is 1, an unimplemented operation exception will 
occur. This includes cases when there is an overflow during conversion. 
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LDC1 


Load Doubleword To FPU 
(Coprocessor 1) LDC1 


31 26 25 21 20 16 15 0 
LDC1 base ft offset 
110101 
6 5 5 16 
Format: 


LDC1 ft, offset (base) 


Description: 


590 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. 


If the FR bit of the Status register is 0, the contents of the doubleword at the 
memory location specified by the virtual address are loaded to floating-point 
registers ft and ft+1. At this time, the high-order 32 bits of the doubleword are 
stored to an odd-numbered register specified by ft+1, and the low-order 32 bits are 
stored to an even-numbered register specified by ft. The operation is undefined if 
the least significant bit in the ft field is not 0. 


If the FR bit is 1, the contents of the doubleword at the memory location specified 
by the virtual address are loaded to floating-point register /t. 


If any of the low-order three bits of the address are not zero, an address error 
exception occurs. 
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LDC1 


FPU Instruction Set Details 


Load Doubleword To FPU 
(Coprocessor 1) LDC1 
(continued) 


Operation: 


32 


64 


T: 


vAddr <- ( (offset;s)'® || offsety5 0) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
data < LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 
if SRo¢ = 1 then 
FGR [ft] — data 
elseif ftg = 0 then 
FGR [ft+1] — datag3. 30 
FGR [ft] data3;_ 0 
else 
undefined_result 
endif 


vAddr < ( (offset;s)4° || offsety5 0) + GPR[base] 
(pAddr, uncached) < Address Translation (vAddr, DATA) 
data < LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 
if SRog = 1 then 
FGR [ft] — data 
elseif ftp = 0 then 
FGR [ft+1] — datag3. 30 
FGR [ft] — data31_o 
else 
undefined_result 
endif 


Exceptions: 


Coprocessor unusable 
TLB miss exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LWC1 


Load Word To FPU 
(Coprocessor 1) LWC1 


31 26 25 21 20 16 15 0 
LWC1 base ft offset 
110001 
6 5 5 16 
Format: 


LWC1 ft, offset (base) 


Description: 


592 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. The contents of the word at the memory 
location specified by the virtual address are loaded to floating-point register ft. 


If the FR bit of the Status register is 0 and if the least-significant bit in the ft field 
is 0, the contents of the word are stored to the low-order 32 bits of floating-point 
register ft. If the least-significant bit in the ft area is 1, the contents of the word 
are stored to the high-order 32 bits of floating-point register ft-1. 


If the FR bit is 1, all the 64-bit floating-point registers can be accessed; therefore, 
the contents of the word are stored to floating-point register ft. The value of the 
high-order 32 bits is undefined. 


If either of the low-order two bits of the address is not zero, an address error 
exception occurs. 


User's Manual U10504EJ7VOUM00 


FPU Instruction Set Details 


Load Word To FPU 
LWC1 (Caproeee ot 1) LWC1 


(continued) 
Operation: 
32 it vAddr < ( (offsetys)'® || offsetys 9) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
data <- LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
if SRo¢ = 1 then 
FGR [ft] — undefined? || data 
else 
FGR [ft] — data 
endif 
64 T: vAddr < ( (offset;s)*° || offsety5 o) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
data <- LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
if SRo6 = 1 then 
FGR [ft] — undefined* || data 
else 
FGR [ft] < data 
endif 
Exceptions: 


Coprocessor unusable exception 
TLB miss exception 

TLB invalid exception 

Bus error exception 

Address error exception 
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Move Word From FPU 
MFC1 (Coprocessor 1) MFC1 
31 26 25 21 20 16 15 11 10 0 
COP1 MF rt fs 0 
010001 | 00000 00000000000 
6 5 5 5 11 


Format: 
MFC1 rt, fs 


Description: 


The contents of floating-point general purpose register fs are stored to the general 
purpose register rt of the CPU register rt. 


The contents of general purpose register t are undefined while the instruction 
immediately following this instruction is being executed. 


If the FR bit of the Status register is 0 and if the least-significant bit in the ft field 
is 0, the low-order 32 bits of floating-point register ft are stored to the general 
purpose register rt. If the least-significant bit in the ft area is 1, the high-order 32 
bits of floating-point register ft-1 are stored to the general purpose register rt. 


If the FR bit is 1, all the 64-bit floating-point registers can be accessed; therefore, 
the low-order 32 bits of floating-point register ft are stored to the general purpose 
register rt. 


Operation: 

32 T: data <— FGR [fs]31_ 6 
T+1: GPR [rt] < data 

64 T: data < FGR [fs]3,. 9 
T+1: GPR{[rt] — (datag;)°* || data 
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Exceptions: 


Coprocessor unusable exception 
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FPU Instruction Set Details 


MOV.fmt Floating-point Move MOV.fmt 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd MOV 
010001 00000 000110 
6 5 5 5 5 6 

Format: 
MOV.fmt fd, fs 

Description: 
The contents of floating-point register fs are stored to floating-point register fd. 
The operand is processed in the floating-point format fmt. 
This instruction is not executed arithmetically, and the IEEE754 exception does 
not occur. 
This instruction is valid only in the single- and double-precision floating-point 
formats. 
If the FR bit of the status register is 0, only an even number can be specified as a 
register number because adjacent even-numbered and odd-numbered registers are 
used in pairs as a floating-point registers. If an odd number is specified, the 
operation is undefined. If the FR bit of the status bit is 1, both the odd and even 
register numbers are valid. 

Operation: 

32,64 = T: StoreFPR (fd, fmt, ValueFPR (fs, fmt) ) 

Exceptions: 


Coprocessor unusable exception 
Floating-point exception 


Floating-Point Exceptions: 


Unimplemented operation exception 
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M To FPU 
MTC1 (Coprecueser 1) MTC1 


31 26 25 21 20 16 15 11 10 0 
COP1 MT rt fs 0 
010001 00100 0000000 0000 
6 5 5 5 11 
Format: 
MTCI rt, fs 
Description: 


The contents of general purpose of the CPU register rt are loaded into the floating- 
point general purpose register fs. 


The contents of floating-point register fs is undefined while the instruction 
immediately following this instruction is being executed. 


The FR bit of the Status register specifies the method of access to the Floating- 
Point General Purpose registers. 


If FR bit equals zero, all 32 Floating-Point General Purpose registers can be 
accessed. Access an odd-numbered register for the high-order 32 bits and an 
even-numbered register for the low-order 32 bits in the format of the floating- 
point operation instruction when transferring double-precision data. 


If the FR bit is 1, all the 32 floating-point general purpose registers can be 
accessed, but the low-order 32 bits of the register are accessed for data. 


Operation: 


32, 64 T: data <— GPR [rt]231 0 
T+1: if SRog= 1 then 
FGR [fs] — undefined®? || data 
else 
FGR [fs] — data 
endif 


Exceptions: 


Coprocessor unusable exception 


596 User’s Manual U10504EJ7VOUMOO 


MUL.fmt 


Floating-point Multiply 


FPU Instruction Set Details 


MUL.fmt 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt ft fs fd MUL 
010001 000010 
6 5 5 5 5 6 

Format: 


MUL.fmt fd, fs, ft 


Description: 


The contents of floating-point register fs are multiplied by those of floating-point 


register ft, and the result is stored to floating-point register fd. The operand is 


processed in the floating-point format fmt. 


This instruction is valid only for conversion from the single- or double-precision 


floating-point format. 


If the FR bit of the Status register is 0, only an even number can be specified as a 
register number because adjacent even-numbered and odd-numbered registers are 


used in pairs as a floating-point registers. If an odd number is specified, the 


operation is undefined. If the FR bit of the Status register is 1, both the odd and 
even register numbers are valid. 


Operation: 


32,64 = T: 


StoreFPR (fd, fmt, ValueFPR (fs, fmt) * ValueFPR (ft, fmt) ) 


Exceptions: 


Coprocessor unusable exception 


Floating-point exception 


Floating-Point Exceptions: 


Unimplemented operation exception 


Invalid operation exception 


Inexact operation exception 
Overflow exception 


Underflow exception 
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NEG.fmt Floating-point Negate 


NEG.fmt 


31 26 25 21 20 16 15 11 10 5 0 
COP1 fmt 0 fs fd NEG 
010001 00000 000111 
6 5 5 5 5 6 

Format: 


NEG. fmt fd, fs 


Description: 


The sign of the contents of floating-point register fs is inverted and the result to 
floating-point register fd is stored. The operand is processed in the floating-point 


format fmt. 


The sign is inverted arithmetically. Therefore, the instruction is invalid if NaN is 


specified as the operand. 


This instruction is valid only for conversion from the single- or double-precision 


floating-point format. 


If the FR bit of the Status register is 0, only an even number can be specified as a 
register number because adjacent even-numbered and odd-numbered registers are 
used in pairs as a floating-point registers. If an odd number is specified, the 
operation is undefined. If the FR bit of the Status register is 1, both the odd and 


even register numbers are valid. 


Operation: 


32,64 ‘iT: StoreFPR (fd, fmt, Negate (ValueFPR (fs, fmt) ) ) 


Exceptions: 
Coprocessor unusable exception 
Floating-point exception 
Floating-Point Exceptions: 


Unimplemented operation exception 
Invalid operation exception 
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ROUND.L.fmt ,ocraisieng ROUND.L.fmt 


Fixed-point Format 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd ROUND.L 
010001 00000 001000 
6 5 5 5 5 6 
Format: 
ROUND.L.fmt fd, fs 
Description: 


The contents of floating-point register fs are converted into the 64-bit fixed-point 
format, and the result is stored to floating-point register fd. The source operand is 
processed in the floating-point format fmt. 


The result of the conversion is rounded to the closest value or even number 
regardless of the current rounding mode. 


This instruction is valid only for conversion from the single- or double-precision 
floating-point format. 


If the FR bit of the Status register is 0, only an even number can be specified as a 
register number because adjacent even-numbered and odd-numbered registers are 
used in pairs as a floating-point registers. If an odd number is specified, the 
operation is undefined. If the FR bit of the Status register is 1, both the odd and 
even register numbers are valid. 


If the source operand is infinite or NaN, and if the rounded result is outside the 
range of 2°3 _1 to -263, the invalid operation exception occurs. If the invalid 
operation exception is not enabled, the exception does not occur, and Oo as 
returned. 


This operation is defined in the 64-bit mode and 32-bit Kernel mode. If this 
instruction is executed during 32-bit User/Supervisor mode, a reserved instruction 
exception occurs. 
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ROUND.L.fmt ,7377Por' ROUND.L.fmt 


Round To Long 
Fixed-point Format 
(continued) 


Operation: 


64 T: StoreFPR (fd, L, ConvertFmt (ValueFPR (fs, fmt) , fmt, L) ) 


Remark — Same operation in the 32-bit Kernel mode. 


Exceptions: 
Coprocessor unusable exception 
Floating-point exception 
Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
Floating-Point Exceptions: 
Invalid operation exception 
Unimplemented operation exception 
Inexact operation exception 
Overflow exception 
Restrictions: 
An unimplemented operation exception will occur in the following cases. 
e If an overflow occurs during conversion to integer format 
e If the source operand is an infinite number 


e If the source operand is NaN 


Essentially, if any of bits 53 to 62 of the result of conversion from a floating-point 
format to a fixed-point format is 1, an unimplemented operation exception will 
occur. This includes cases when there is an overflow during conversion. 
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FPU Instruction Set Details 


ROUND.W.fmt Floating-point ROUND.W.fmt 


Round To Single 
Fixed-point Format 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd ROUND.W 
010001 00000 001100 
6 5 5 5 5 6 
Format: 
ROUND.W.fmt fd, fs 
Description: 


The contents of floating-point register fs are converted into the 32-bit fixed-point 
format, and the result is stored to floating-point register fd. The source operand is 
processed in the floating-point format fmt. 


The result of the conversion is rounded to the closest value or even number 
regardless of the current rounding mode. 


This instruction is valid only for conversion from the single- or double-precision 
floating-point format. 


If the FR bit of the Status register is 0, only an even number can be specified as a 
register number because adjacent even-numbered and odd-numbered registers are 
used in pairs as a floating-point registers. If an odd number is specified, the 
operation is undefined. If the FR bit of the Status register is 1, both the odd and 
even register numbers are valid. 


If the source operand is infinite or NaN, and if the rounded result is outside the 
range of 23! 1 to -23!, the invalid operation exception occurs. If the invalid 
operation exception is not enabled, the exception does not occur, and Pla is 
returned. 
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ROUND.W.fmt Floating-point ROUND.W.fmt 


Round To Single 
Fixed-point Format 
(continued) 


Operation: 


32, 64 T: StoreFPR (fd, W, ConvertFmt (ValueFPR (fs, fmt) , fmt, W) ) 


Exceptions: 


Coprocessor unusable exception 
Floating-point exception 
Floating-Point Exceptions: 
Invalid operation exception 
Unimplemented operation exception 
Inexact operation exception 
Overflow exception 
Restrictions: 
An unimplemented operation exception will occur in the following cases. 
e If an overflow occurs during conversion to integer format 
e If the source operand is an infinite number 


e If the source operand is NaN 


Essentially, if any of bits 53 to 62 of the result of conversion from a floating-point 
format to a fixed-point format is 1, an unimplemented operation exception will 
occur. This includes cases when there is an overflow during conversion. 
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SDC1 


FPU Instruction Set Details 


Store Doubleword From FPU 
(Coprocessor 1) SDC1 


31 26 25 21 20 16 15 0 
SDC1 base ft offset 
111101 
6 5 5 16 
Format: 
SDC1 ft, offset(base) 
Description: 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. 


The contents of floating-point registers ft and ft+1 are stored to the memory 
position specified by the virtual address as a doubleword if the FR bit of the Status 
register is 0. At this time, the contents of the odd-numbered register specified by 
ft+1 correspond to the high-order 32 bits of the doubleword, and the contents of 
the even-numbered register specified by ft correspond to the low-order 32 bits. 


If the least significant bit in the ft field is not 0, this instruction is not defined. 


If the FR bit is 1, the contents of floating-point register ft are stored to the memory 
location specified by the virtual address as a doubleword. 


If any of the low-order three bits of the address are not zero, an address error 
exception occurs. 


User’s Manual U10504EJ7VOUM0O 603 


Chapter 17 


Store Doubl dF FPU 
SDC1 st (Capioces cont ‘i SDC1 


(continued) 


Operation: 


32 TT:  vAddr < ((offset;s)'® || offsetys 9) + GPR [base] 

(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
if SRog = 1 

data -— FGR [ftle3. 0 
elseif ftp = 0 then 

data — FGR [fi+1]31.0 || FGR [ft]31.0 
else 

data <— undefine 
endif 
StoreMemory (uncached, DOUBLEWORD, data, pAddr, vAddr, DATA) 


64 T: vAddr < ( (offset;s)*® || offset;s 9) + GPR [base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
if SRog = 1 
data <= FGR [ft]e3..0 
elseif ftp = 0 then 
data -— FGR [fi+1]31.0 || FGR [ftl31.0 
else 
data <- undefined®* 
endif 
StoreMemory (uncached, DOUBLEWORD, data, pAddr, vAddr, DATA) 


64 


Exceptions: 


Coprocessor unusable 

TLB miss exception 

TLB invalid exception 

TLB modification exception 
Bus error exception 
Address error exception 
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FPU Instruction Set Details 


SQRT.fmt = ‘ScdareHoor, «=©—0 | SQRT.fmt 


31 


26 25 21 20 16 15 11 10 6 5 0 


COP1 
010001 


fmt 0 fs fd SQRT 
00000 000100 


6 


5 5 5 5 6 


Format: 


SQRT.fmt fd, fs 


Description: 


The positive arithmetic square root of the contents of floating-point register fs is 
calculated and the result is stored to floating-point register fd. The operand is 
processed in the floating-point format fmt. The result is rounded as if calculated 
to infinite precision and then rounded according to the current rounding mode. If 
the value of the source operand is —0, the result will be -0. The result is placed in 
the floating-point register specified by fd. 


This instruction is valid only for conversion from the single- or double-precision 
floating-point format. 


If the FR bit of the Status register is 0, only an even number can be specified as a 
register number because adjacent even-numbered and odd-numbered registers are 
used in pairs as a floating-point registers. If an odd number is specified, the 
operation is undefined. If the FR bit of the Status register is 1, both the odd and 
even register numbers are valid. 


Operation: 


32, 64 


Te StoreFPR (fd, fmt, SquareRoot (ValueFPR (fs, fmt) ) ) 


Exceptions: 


Coprocessor unusable exception 
Floating-point exception 


Floating-Point Exceptions: 


Unimplemented operation exception 
Invalid operation exception 
Inexact operation exception 
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SUB.fmt Floating-point Subtract SUB.fmt 


31 


26 25 21 20 16 15 1110 6 5 0 


fmt ft fs fd SUB 


Format: 


SUB.fmt fd, fs, ft 


Description: 


The contents of floating-point register ft from those of floating-point register fs, 
and the result is stored to floating-point register fd. The result is rounded as if 
calculated to infinite precision and then rounded according to the current rounding 
mode. 


This instruction is valid only for conversion from the single- or double-precision 
floating-point format. 


If the FR bit of the Status register is 0, only an even number can be specified as a 
register number because adjacent even-numbered and odd-numbered registers are 
used in pairs as a floating-point registers. If an odd number is specified, the 
operation is undefined. If the FR bit of the Status register is 1, both the odd and 
even register numbers are valid. 


Operation: 


32, 64 


T: StoreFPR (fd, fmt, ValueFPR (fs, fmt) — ValueFPR (ft, fmt) ) 
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Exceptions: 


Coprocessor unusable exception 
Floating-point exception 


Floating-Point Exceptions: 


Unimplemented operation exception 
Invalid operation exception 

Inexact operation exception 
Overflow exception 

Underflow exception 
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FPU Instruction Set Details 


Store Word From FPU 
SWC1 (Coprocessor 1) SWC 


31 26 25 21 20 16 15 0 


SWC1 base ft offset 


5 5 16 


Format: 
SWC ft, offset (base) 


Description: 


The 16-bit offset is sign-extended and added to the contents of general purpose 
register base to form a virtual address. The contents of the floating-point general 
purpose register ft are stored at the memory location of the specified address. 


If the FR bit of the Status register is O and the least-significant bit in the ft field is 
0, the contents of the low-order 32 bits of floating-point register ft are stored. If 
the least-significant bit in the ft field is 1, the contents of the high-order 32 bits of 
floating-point register ft-1 are stored. 


If the FR bit is 1, all the 64-bit floating-point registers can be accessed; therefore, 
the contents of the low-order 32 bits in the ft field are stored. 


If either of the low-order two bits of the address are not zero, an address error 
exception occurs. 
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SWC1 = *"“Ecprocessor 1) SWC 


(continued) 


Operation: 


32 T: vAddr <- ( (offset,s)'® || offset; 0) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
data -— FGR [ftls1. 0 
StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 


64 di vAddr < ( (offset;s)*° || offsetys 0) + GPR[base] 
(pAddr, uncached) < AddressTranslation (vAddr, DATA) 
data — FGR [ftls1. 0 
StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 


Exceptions: 


Coprocessor unusable 

TLB miss exception 

TLB invalid exception 

TLB modification exception 
Bus error exception 
Address error exception 
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FPU Instruction Set Details 


TRUNC.L.fmt {ero , TRUNC.L.fmt 


Truncate To Long 
Fixed-point Format 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd TRUNC.L 
010001 00000 001001 
6 5 5 5 5 6 
Format: 
TRUNC.L.fmt fd, fs 
Description: 


The contents of floating-point register fs are converted into the 64-bit fixed-point 
format, and the result is stored to floating-point register fd. The source operand is 
processed in the floating-point format fmt. 


The result of the conversion is rounded toward the 0 direction, regardless of the 
current rounding mode. 


This instruction is valid only for conversion from the single- or double-precision 
floating-point format. 


If the FR bit of the Status register is 0, only an even number can be specified as a 
register number because adjacent even-numbered and odd-numbered registers are 
used in pairs as a floating-point registers. If an odd number is specified, the 
operation is undefined. If the FR bit of the Status register is 1, both the odd and 
even register numbers are valid. 


If the source operand is infinite or NaN, and if the rounded result is outside the 
range of 2°3 _1 to -263, the invalid operation exception occurs. If the invalid 
operation exception is not enabled, the exception does not occur, and Oo as 
returned. 


This operation is defined in the 64-bit mode and 32-bit Kernel mode. If this 
instruction is executed during 32-bit User/Supervisor mode, a reserved instruction 
exception occurs. 
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TRUNC.L.fmt _foanseant ~=TRUNC.L.fmt 


Truncate To Long 
Fixed-point Format 
(continued) 


Operation: 


64 T: StoreFPR (fd, L, ConvertFmt (ValueFPR (fs, fmt) , fmt, L) ) 


Remark — Same operation in the 32-bit Kernel mode. 


Exceptions: 
Coprocessor unusable exception 
Floating-point exception 
Reserved instruction exception (Vp4300 in 32-bit User or Supervisor mode) 
Floating-Point Exceptions: 
Invalid operation exception 
Unimplemented operation exception 
Inexact operation exception 
Overflow exception 
Restrictions: 
An unimplemented operation exception will occur in the following cases. 
e If an overflow occurs during conversion to integer format 
e If the source operand is an infinite number 


e If the source operand is NaN 


Essentially, if any of bits 53 to 62 of the result of conversion from a floating-point 
format to a fixed-point format is 1, an unimplemented operation exception will 
occur. This includes cases when there is an overflow during conversion. 
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Floating-point 
Truncate To Single 


Fixed-point Format 


FPU Instruction Set Details 


TRUNC.W.fmt 


31 26 25 21 20 16 15 11 10 6 5 0 
COP1 fmt 0 fs fd TRUNC.W 
010001 00000 001101 
6 5 5 5 5 6 
Format: 
TRUNC.W.fmt fd, fs 
Description: 


The contents of floating-point register fs are arithmetically converted into a 32-bit 
fixed-point single format, and the result is stored to floating-point register fd. The 
source operand is processed in the floating-point format fmt. 


The result of the conversion is rounded toward the 0 direction, regardless of the 
current rounding mode. 


This instruction is valid only for conversion from the single- or double-precision 
floating-point format. 


If the FR bit of the Status register is 0, only an even number can be specified as a 
register number because adjacent even-numbered and odd-numbered registers are 


used in pairs as a floating-point registers. If an odd number is specified, the 


operation is undefined. If the FR bit of the Status register is 1, both the odd and 
even register numbers are valid. 


If the source operand is infinite or NaN, and if the rounded result is outside the 


range of 23! _1 to -23!| the invalid operation exception occurs. If the invalid 


operation exception is not enabled, the exception does not occur, and Wels 


returned. 
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TRUNC.W.fmt _ Floating point TRUNC.W.fmt 


Truncate To Single 
Fixed-point Format 
(continued) 


Operation: 


32, 64 


T: StoreFPR (fd, W, ConvertFmt (ValueFPR (fs, fmt) , fmt, W) ) 
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Exceptions: 


Coprocessor unusable exception 
Floating-point exception 


Floating-Point Exceptions: 


Invalid operation exception 
Unimplemented operation exception 
Inexact operation exception 
Overflow exception 


Restrictions: 


An unimplemented operation exception will occur in the following cases. 
e If an overflow occurs during conversion to integer format 
e If the source operand is an infinite number 


e If the source operand is NaN 


Essentially, if any of bits 53 to 62 of the result of conversion from a floating-point 
format to a fixed-point format is 1, an unimplemented operation exception will 
occur. This includes cases when there is an overflow during conversion. 
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17.6 FPU Instruction Opcode Bit Encoding 


Figure 17-3 lists the Bit Encoding for FPU instructions. 


Opcode 
28...26 

31..29 0 1 2 3 4 5 6 yi 
0 
1 
2 COP1 
3 
4 
5 
6 LWC1 LDC1 
7 SWC1 SDC1 

sub 
23...21 

25 24 0 1 2 3 4 5 6 7 
0 MF | DMFy CF Y MT | DMT | CT Y 
1 BC Y Y Y Y Y Y Y 
2 S) D Y Y W Ly Y Y 
3 Y af Y Y Y Y Y Y 

18...16 br 

90..19 9 1 2 3 4 5 6 7 
0 BCF BCT BCFL BCTL * * * * 
1 ok ok ok ok ok ok ok ok 
2 ok ok ok ok ok ok ok ok 
3 re * oa * * Pe re * 


Figure 17-3 Bit Encoding for FPU Instructions (1/2) 
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2.0 function 
5... 0 1 2 3 4 5 6 7 
0 ADD SUB MUL DIV SQRT ABS MOV NEG 
1 ROUND.Ly) TRUNC.Ly| CEIL.Ly | FLOOR.Ly}ROUND.W | TRUNC.W | CEIL.W | FLOOR.W 
e Y Y Y Y Y Y 7 Y 
3 Y i Y Y i Y i Y 
4 CVT.S | CVT.D Y Y CVT.W | CVT.Ly Y Y 
5 Y y Y Y Y Y Y 
6 C.F C.UN C.EQ | C.UEQ | C.OLT | C.ULT | C.OLE | C.ULE 
ie C.SF | C.NGLE) C.SEQ |) C.NGL |] C.LT | C.NGE | C.LE | C.NGT 
Figure 17-3 Bit Encoding for FPU Instructions (2/2) 

Key: 

When the operation code marked with an asterisk is executed, the 
reserved instruction exception occurs. This code is reserved for 
future expansion. 

Y Operation codes marked with a gamma cause unimplemented 
operation exceptions in all current implementations and are 
reserved for future expansion. 

n When the operation code marked with an eta is executed, the result 
is valid only when use of the MIPS III instruction set is enabled. 
If the operation code is executed when use of the instruction set is 
disabled (in the 32 bit User/Supervisor mode), the unimplemented 
operation exception occurs. 
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Chapter 18 


Connect several passive elements externally to the Vp4300 so that the processor 
can operate normally. Connect the elements to the PLLCap0, PLLCap1, VppP. 
and GNDP pins. 


Figure 18-1 shows the connections of the passive elements for PLL. 


Vpp 


Vp4300 


VppP 


PLLCap1 


GNDP 


PLLCap0 


GND 


Remarks 1. C1, C2, C3, Cp%1, Cp%2, R, and L are mounted on the board. 


2. Either R or L may do in a system where it has been confirmed 
through experiment that noise is not superimposed on VppP and 
GNDP. 


3. The value of each element differs depending on the system. Find 
the appropriate values for each system through experiment. 


Figure 18-1 Connection Example of PLL Passive Elements 
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Figure 18-2 shows a layout example of 120-pin plastic QFP and capacitor on 
PWB. 


= 
=e 
oO 
oo 
=} 
mM 
3 
S 
@ 
1e) 


Lx] 


Remarks x : GND-Vpp Bypass Capacitors 
C2 : GNDP-VppP Bypass Capacitors 
%1,%2 : PLL Capacitors 


Figure 18-2. Layout Example of QF P and Capacitor on PWB 


Separate the wiring of the power (VppP) and ground (GNDP) for PLL from the 
normal power (Vpp) and ground (GND) wiring. Here is an example of the value 
of each element. 


R=5Q Cl=1nF C2=82nF 
C3 = 10 uF Cp =470 pF 


Because the optimum values of filter elements differ depending on the application 
and noise environment of the system. Therefore, the above values are given for 
reference only. Find the optimum values for users’ application through trial and 
error. A choke element (inductor: L) may be used instead of the resistor (R) used 
as a power filter. 


User’s Manual U10504EJ7VOUM00 617 


[MEMO] 


618 User’s Manual U10504EJ7VOUMOO 


Coprocessor O Hazards 


User's Manual U10504EJ7VOUM00 


19 


619 


Chapter 19 


620 


If a conflict of internal resources takes place between instructions (such as when 
the contents of the destination register are used as the source for the next 
instruction), the Vp4300 interlocks the pipeline to prevent conflict of internal 
resources. Therefore, it is not necessary to insert a NOP instruction between 
instructions. 


However, the CPO register and TLB are not interlocked. When developing a 
program that uses the CPO register and TLB, therefore, take conflict of the internal 
resources into consideration. CPO hazard defines the number of NOP instructions 
to be inserted between instructions to avoid conflict of internal resources, or the 
number of instructions independent of the conflict. This chapter explains this CPO 
hazard. 


The value of Vp4300 CPO hazards is equivalent or less than those of the Vp4400; 
Table 19-1 lists the Vp4300 CPO hazards. Code which complies with these 
hazards will run without modification on the Vp4400 or Vp4200. 


When the data of the CPO register or bit is defined in the Source column in the 
following table, that data can be used as a source. If data is stored in the CPO 
register or bit shown in the Destination column, that data is used as the destination. 


The number of NOP instructions between the instructions related to the CPO 
register and TLB, or the number of the instructions independent of the conflict can 
be calculated from the following expression, using this table. 

(Number of destination hazards of instruction A) - 


{(Number of source hazards of instruction B) +1} 


As an example, to find the number of instructions required between an MTCO and 
a subsequent MFCO instruction, this is: 
(7) - (4 + 1) = 2 instructions 


Caution The hazard related to CPO does not generate the interlock of the 
pipeline. Therefore, control the number of required instructions 
by program. 
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Table 19-1 Coprocessor 0 Hazards 


Source Destination 
Operation Number Number 
Name of Name of 
Hazard Hazard 
MTCO cpr rd 7 
MFCO cpr rd 4 
PageMask, EntryHi 
TLBR Index, TLB 57 EntryLo0, EntryLol 8 
Index or Random 
= PageMask, EntryHi, 5-8 TLB 8 
EntryLo0, EntryLol 
TLBP PageMask, EntryHi 3-6 Index 7 
Status.EXL, 
ERET Digg a 4y Status.ERL Tee 
> LLbit 7 
CACHE Index Load Tag TagLo, TagHi, ECC | 88 
CACHE Index Store Tag | TagLo, TagHi, ECC 7 
CACHE Hit ops. Status.CH 8 
Coprocessor usable test pias eee eS 2 
Status.EXL, Status. ERL 
EntryHi.ASID 
Status.KSU, Status.EXL, 0 
Instruction fetch Status.ERL, Status.RE, 
Config. KO 
TLB 2 
Instruction fetch EEG ane 
exception Cause, BadV Addr, 3 
Context 
Cause.IP, Status.IM 
Interrupt Status.IE, Status. EXL 3 
Status. ERL 
EntryHi.ASID 
Status.KSU, Status.EXL, 4 
Load/Store Status.ERL, Status.RE, 
Config.KO, TLB 
WatchHi, WatchLo 4-5 
Load/Store exception EEE SES Cas 8 
BadV Addr, Context 
TLB shutdown Status.TS 7 
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Remarks 1. 


Cautions 1. 


A hazard is associated when an instruction related to the bit 
specified by the source or destination is executed. For example, 
if CP1 is enabled by setting Status.C to 1 by the MTCO 
instruction, all the instructions using CP1 (FPU) are subject to 


hazard. 


a Status.EXL and Status.ERL are cleared in stage 8, but the 
effect of clearing them is visible at the time of an instruction 
fetch starting at the beginning of stage 4. 


6 One instruction to separate Index Load Tag and MFCO Tag 
will do, even though a above would imply three instructions. 


The instruction following a MTCO instruction must not 
be a MFCO instruction. 


The five instructions following a MTCO instruction to 
Status register that changes KSU bit and sets EXL or 
ERL bits may be executed in the new mode, and not in 
the Kernel mode. This can be avoided by setting EXL 
bit first, leaving KSU bit set to Kernel, and later 
changing KSU bit. 


There must be two non-load, non-CACHE instructions 
between a store instruction and a CACHE instruction 
directed to the same cache line as the store destination. 


y An ERET instruction following an MTCO instruction that sets 
the ERL bit in the Status register (Status.ERL) must be 
separated from the MTCO instruction by three instructions. 


If the KO bit of the config register is changed to the non-cache 
mode by using the MTC instruction, the non-cache area is set 
when the instruction fetch two instructions after the MTCO 
instruction is executed. 


If a jump or branch instruction is executed immediately after 
the ITS bit of the Status register has been set, a stall lasting 
for several instructions will occur. 
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The status in which CPO hazard must be taken into consideration when each 
instruction is executed is explained below. 
(1) MTCO 


Destination: Completion of writing to destination register (CPO) by MTCO 
instruction 


(2) MFCO 
Source: Determination of source register (CPO) of MFCO instruction 
(3) TLBR 


Source: Determination of TLB status and /ndex register before execution of TLBR 
instruction 


(4) TLBWI, TLBWR 


Source: Determination of source register of TLB WI and TLBWR instructions and 
register used for TLB entry specification 
Destination: Completion of writing to TLB by TLBWI and TLBWR instructions 


(5) TLBP 


Source: Determination of PageMask register and EntryHi register before 
execution of TLBP instruction 

Destination: Completion of writing result of TLBP instruction execution to Index 
register 


(6) ERET 


Source: Determination of register holding information necessary for ERET 
instruction execution 

Destination: Completion of processor status transition due to ERET instruction 
execution 


(7) CACHE Index Load Tag 
Destination: Completion of writing execution of this instruction to each register 
(8) CACHE Index Store Tag 


Source: Determination of register holding information necessary for execution of 
this instruction 


(9) Coprocessor use test 


Source: Determination of mode set by bit value of CPO register in Source column 
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Examples 1. When accessing the CPO register in the user mode after changing 
the content of the Status.CUO bit or when executing an instruction 
using the resources of CPO (such as TLB instruction, Cache 
instruction, or branch instruction) 


2. When accessing the CPO register in the operating mode used after 
the contents of the Status.KSU, EXL, and ERL bits have been 
changed 


3. When using the FPU (CP1) after the content of the Status.CU/ bit 
has been changed 


(10) Instruction fetch 
Source: Determination of operating mode and TLB necessary for instruction fetch 


Examples 1. When fetching instructions after the mode has been changed from 
User to Kernel after the contents of the Status.KSU, EXL, and 
ERL bits have been changed 


2. When rewriting TLB and fetching an instruction by using its TLB 
entry 


(11) Instruction fetch exception 


Destination: Completion of writing to each register holding information related to 
an exception when the exception has occurred as a result of instruction fetch 


(12) Interrupt 


Source: Determination of each register that identifies an exception generation 
condition when an interrupt cause occurs 


(13) Load/store 


Source: Determination of operating mode related to address generation by load/ 
store instruction, determination of TLB entry, determination of cache mode set by 
the Config. KO bit, and determination of a register that sets a watch exception 
generation condition 


Example When executing the load/store instruction in the kernel area after 
the mode has been changed from User to Kernel 


(14) Load/store exception 


Destination: Completion of writing to each register holding information related to 
an exception when the exception occurs as a result of a load/store operation 


(15) TLB shut down 


Destination: Completion of writing to Status.TS bit when TLB shut down occurs 
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Table 19-2 shows examples of calculating the number of CPO hazards and the 
number of instructions to be inserted. 


Table19-2 Example of Calculating Number of CPO Hazards and Number of Instructions Inserted 


Conflicting| Number of 


Destination Source Internal | Instructions 
Resources Inserted 
TLBWR/TLBWI TLBP TLB Entry 
TLBWR/TLBWI | Ladlstore using newly | oy & Entry 
rewritten TLB 
Instruction fetch using 
TLBWR/TLBWI TLB Entry 


newly rewritten TLB 
Coprocessor instruction | Status 


Mich sams requiring setting of CU [CU] - ey 
TLBWR MFCO EntryHi EntryHi 3 8-(4+1) 
MTCO EntryLo0 TLBWR/TLBWI EntryLo0 1 7-(5+1) 
TLBP MFCO Index Index 2 7-(4+1) 
MTCO EntryHi TLBP EntryHi 1 
MTCO EPC ERET EPC 2 
MTCO Status ERET Status 3 
MTCO Status [IB}* | Instruction causing Status [IE] 3 


interrupt 


* The number of hazards is undefined if the execution sequence is changed 
by an exception. In this case, the minimum number of hazards until the val- 
ue of the JE bit is determined and the maximum number of hazards until a 
pending and enabled interrupt occurs may be the same. 
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Differences Between the Vp4300, Vp4305, and Vp4310 


The following table describes the differences between the Vp4300, Vp4305, and 
VR4310. 
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Table A-1 Differences Between the Vp4300, Vp4305, and Vp4310 


frequency division 
rate 


Parameter VR4300 VR4305 VRp4310 
System bus Write data transfer Two buses (D/Dxx) 
Initial value setting | DivMode (1:0) DivMode (2:0) 
pins at reset time (Can be set on power application (Can be set on 
only) power 
application only) 
Block write access Sequential ordering 
State after final data | Final data retained in transfer rate setting 
write 
Non-cache high- Provided 
speed write 
Integer Corresponding MIPS I, II, and III instruction sets 
operation unit | instructions 
Cache memory | Data protection None 
JTAG interface Provided 
SyncOut-Syncln path Provided 
Clock Input vs. internal PSO 3042 Weg 2059 3:45, 6 
interface multiplication rate 
Internal vs. bus LS e234 |e 25°83, 456 


Power mode Low power mode 


Pipeline/system bus operated at a 
quarter of the normal rate S 


None 


Wait mode 


None 


PRId register 


Imp = 0 x OB 


*1. The 1.5 times frequency setting is allowed with the 100 MHz model only. 
(With the 133 MHz model, this setting is reserved.) 


*2. The 4 times frequency setting is allowed with the 133 MHz model only. 
(With the 100 MHz model, this setting is reserved.) 


*3. The 2.5 times frequency setting is allowed with the 167 MHz model only. 
(With the 133 MHz model, this setting is reserved.) 


*4, The 133 MHz model of the Vp4300 is not supported. 
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The Vp4300 is slightly different from the Vp4400 in terms of system design and 
software. This Appendix describes the differences between the Vp4300 and 
VRp4400. 


The major differences lie in cache handling. This is because the Vp4300 does not 
support a secondary cache control function and a multi-processing function and 
because it employs a 32-bit external bus interface. 
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B.1 Differences in Software 


The logical differences in software are the specifications of the CPO registers. 
These differences are shown in Table B-1. 


B.1.1 CACHE Instruction 


Up to 4 MB of a secondary cache memory can be connected to the Vp4400. By 
contrast, the Vp4300 does not support a secondary cache. Therefore, the 
operations of the CACHE instructions that reference SD (secondary data cache) 
and SI (secondary instruction cache) are undefined. 


All write back processing is transfer from the primary cache to the main memory. 


The CACHE instruction Hit Set Virtual that is used to access the SD and SI with 
the Vp4400 is undefined with the Vp4300. 


The Dirty bit (W bit of the Vp4400) of the data cache can be cleared by the 
CACHE instruction Hit_Write_Back. 


The Vp4300 has a cache state bit. The Vp4400 has two cache state bits to support 
multi-processing. To manipulate this bit of the Vp4300, write the bit 7 of the 
TagLo register using a CACHE instruction (Index_Store_Tag_D). With the 
VR4400, the bits 6 and 7 of the TagLo register are written. 


B.1.2 Cache Parity 


Because the Vp4300 does not check the cache data by using a parity, the cache 
error register (27) always outputs 0, and writing this register is ignored. The parity 
error register (26) can be used for only self-diagnosis and cannot be used to 
manipulate the cache. 


B.1.3 Status Register 


The bit specifications of the status registers are slightly different between the 
VR4300 and Vp4400. 


The fixed bits (bits 24 and 27) of the status register of the Vp4400 function as an 
instruction trace support (ITS) bit (bit 24) and low power mode™ (RP) bit (bit 27) 
with the Vp4300. 


* The low power mode is supported only in the 100 MHz model of the Vp4300 
and the Vp4305. Fix the RP bit of the 133 MHz model of the Vp4300 and the 
VR4310 to 0. 
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The CH bit of the Vp4300 can be written only by software. With the Vp4400, 
however, this bit is set or cleared by hardware when a secondary cache instruction 
is executed. 


The CE and DE bits of the Status register of the Vp4300 are used to manipulate 
the parity and do not affect the operation. 


For details, refer to 6.3.5 Status Register (12). 


B.1.4 Config Register 


The Config register of the Vp4300 only supports part of the bit functions of the 
Config register of the Vp4400. 


For details, refer to 5.4.6 Config Register (16). 


B.1.5 Status of FCR31 on Occurrence of Unimplemented Operation Exception 


If the floating-point unimplemented operation exception occurs with the Vp4400, 
the cause bits of the FCR31 for the floating-point operation exception other than 
the unimplemented operation exception bit (E) are undefined. The exception 
handler for the unimplemented operation should ignore the cause bits other than 
the E bit. 


The Vp4300 is more strictly defined. If the unimplemented operation exception 
occurs, the cause bits of the other floating-point operation exceptions are not set. 


B.1.6 Integer Zero Division 


If an integer is divided by zero, the result is undefined with MIPS ISA (Instruction 
Set Architecture). This illegal operation returns the following values to the 
registers of the Vp4300 and Vp4400. 


Processor Dividend Lo Register Hi Register 
VR4400 =0 OxFFFF FFFF Dividend 
<0 0x0000 0001 Dividend 
VR4300 =0 Ox7FFF FFFF Dividend 
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B.1.7 Cache Parity Error Exception 


Because the Vp4300 does not check data by using a cache parity, a parity error 
exception does not occur. 


Table B-1 Differences in Software 


aera Product Name V 24300 Vp4400 
CACHE Secondary cache | Not supported Supported 
instruction Parity None Provided 
Status register | Bit 27 Low power mode 0 

Bit 24 Instruction trace support 0 

CE and DE bits sence Pieseenee Used for parity 
Config register Only part of bit functions All supported 

supported 

Unimplemented operation Cause bits other than E bit | Cause bits other than E bit 
exception cleared undefined 
Integer zero division Value returned to register differs 
Cache error exception Does not occur Always normal operation 


* = * 100 MHz model of the Vp4300 and the Vp4305 only 
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B.2 Differences in System Design 


Next, the differences in system design between the Vp4300 and Vp4400 are 
described. Table B-2 shows these differences. 


B.2.1 Initialization of Processor 


With the Vp4400, many modes must be set on boot. Setting mode of the Vp4300 
is more simple. This is because the Vp4300 sets mode not by software but by 
using external pins. 


The Reset signal of the Vp4300 may be active or inactive during cold reset. 
However, do not change the value of this signal during reset sequence. 


At soft reset, assert the Reset signal of the Vp4300 active for the duration of 
16MasterClock or longer. With the Vp4400, the Reset signal must be asserted 
active for the duration of at least 64MasterClock cycles. 


B.2.2 System Interface 
The SysAD bus of the Vp 4400 is 64 bits wide, but the Vp4300 has a 32-bit SysAD 
bus without a parity check function. 
Multi-Processing Function and Secondary Cache Control Function 


The Vp4300 uses the same SysAD bus protocol as the Vp4400. But because the 
VR4300 does not support a multi-processing function and a secondary cache 
control function, its external bus is provided with only part of the SysAD bus 
specifications. 


The operations related to the multi-processing function and secondary cache that 
are defined for the Vp4400 are undefined with the Vp4300. 


Line Size of Cache 
The line size of the cache of the Vp4300 is as follows. 
Instruction cache : 8 words (32 bytes) 


Data cache : 4 words (16 bytes) 
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Data Transfer Rate 


The Vp4400 has nine data rates (D, DDx, DDxx, DxDx, DDxxx, DDxxxx, 
DxxDxx, DDxxxxx, and DxxxDxxx). 


The Vp4300 has two data rates (D and Dxx). These data rates are selected by 
using the EP bit of the Config register. 


The Vp4400 requires at least 4 cycles as processor request cycles. Consequently, 
if successive single read request are made, or if write requests and read requests 
are made successively, two idle cycles are inserted in between two requests, like 
“ADxxAD”. 


If write or read are performed successively in the fastest mode (data rate: D) of the 
VR4300, however, no idle cycle is needed between write/read cycles, like 
“ADAD”. 


When data is input from an external device, the Vp4300 can support any data 
transfer via the SysAD bus. The Vp4300 can input data at a data rate of 
“DDDDDDDD”, but cannot input a data stream exceeding 8 words (32 bytes). 


TClock and RClock 


The Vp4400 has two TClock pins. 
The Vp4300 has only one TClock pin to reduce the power consumption. 


The Vp4400 has RClock as the reception clock of the external agent, but the 
VR4300 does not have RClock because it transfers or receives data by using 
TClock. 


Effect of RP Bit 
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With the Vp4400, SClock and TClock are not affected by the RP bit. The 
VR4300, in contrast, can reduce the clock frequencies of SClock and TClock to 
the 1/4 of the normal level by using the RP bit. 


To use this function, if there is an external circuit (such as a DRAM refresh 
counter) that is affected by changes in the frequency of the clock supplied by the 
VR4300 to external devices, incorporate a process that supports frequency 
conversion of the external circuit into the software. 


* 100 MHz model of the Vp4300 and the Vp4305 only 
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Table B-2 Differences in System Design 


Function 


Product Name 


VR4300 


VR4400 


Initialization of processor 


Set by external pins 


Set by software 


System 
interface 


Bus width 32 64 
Data check Not performed Parity/ECC selectable 
Multi-processing and | Not supported Supported 


secondary cache 


Line size of cache 


Instruction: 8 words 
Data: 4 words 


4/8 words selectable for 
both instruction/data cache 


Data rate 2 types 9 types 

TClock 1 2 

RClock None 2 

Effect of RP bit Reduces frequencies of Does not affect TClock and 


TClock and SClock to 1/4” 


* = * 100 MHz model of the Vp4300 and the Vp4305 only 
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B.3 Other Differences 


In addition to the above differences, the Vp4300 and Vp4400 differ in the 
following points. The differences described in this section are summarized in 
Table B-3. 


B.3.1 Cache Size 


The specifications of the primary cache of the Vp4000, Vp4400, and Vp4300 are 
shown in the following table. 


~_ Product Name V 24300 V 24000 VR4400 

Cache Instruction 16 KB 8 KB 16 KB 

capacity Data 8 KB 8 KB 16 KB 

Line size Instruction: 8 words (32 bytes) 4/8 words selectable 
Data: 4 words (16 bytes) 

Method Direct map, virtual index 


To initialize or invalidate, or program each routine of flash, keep in mind the 


differences in cache size. 


B.3.2 TLB 


TLB Entry 


The Vp4300 has a full-associate TLB with 32 entries. Each entry is mapped to the 
even/odd page of a page frame number. 


The TLB of the Vp4400 is the same as that of the Vp4300 in structure, but has 48 
entries. 


Interaction between IMT and TLB Manipulations 


The operation of the Vp4400 is undefined when the TLB instruction accesses 
JTLB during the instruction TLB miss (IMT) stall, and consequently, the TLB 
invalid exception may occur. This exception is likely to occur especially when an 
entry different from the one that has caused the instruction TLB miss is accessed 
by software for read/write manipulation (TLBWI, TLBWR, or TLBR). 


This does not apply to the Vp4300. 
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B.3.3 Floating-Point Unit 


Floating-Point Data Path 


The floating-point operation of the Vp4300 is executed by using the main pipeline 
and data path of the integer operation unit. While a multicycle instruction of 
floating-point operation is executed, therefore, the pipeline of integer operation 
Stalls. 


The Vp4400 has a dedicated floating-point data path in addition to an integer data 
path. Therefore, if a program with the floating-point operation instruction and 
integer operation instruction optimized for the Vp4400 is executed with the 
VR4300, not much effect can be expected. 


Instruction Execution Time 


The Vp4300 completely executes any multicycle instruction that has caused a 

source exception (exception of the source operand of an instruction) in one cycle. 
Instead, it issues the default result to the cycle according to the trap enable flag, or 
notifies occurrence of a trap exception in the next cycle. In addition, calculation 
such as 0 x 0 can be executed with the fewer cycles than the ordinary calculation. 


The Vp4400 always executes each multicycle instruction with the same number 
of cycles, regardless of whether or not an exception occurs. 


Cvt. [s,d] .I Instruction 


When converting a 64-bit integer into a single- or double-precision floating-point 
number, the Vp4400 generates a floating-point unimplemented operation 
exception unless all the bits 63 through 52 of the integer are 0 or 1. 


The Vp4300 generates the floating-point unimplemented operation exception 
unless all the bits 63 through 55 of a 64-bit integer are 0 or 1. 


B.3.4 Pipeline 
The Vp4400 uses an 8-stage super pipeline. 


The Vp4300 uses a 5-stage pipeline like that of the Vp3000. The pipeline of the 
VR4300 is not a super pipeline, but is not different from the super pipeline in terms 
of functions. However, if the program is optimized, the performance of the 
pipeline may be influenced. 


The number of stall cycles that are generated by the Vp4300 is fewer than that of 
the Vp4400. 
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B.3.5 Interrupt 


The bit 15 of the cause register of the Vp4300 is dedicated to the timer interrupt 
that occurs if the value of the counter register coincides with the value of the 
compare register. Therefore, the Vp4300 is not provided with the Int5 pin that is 
provided to the Vp4400. 


Because the Vp4300 does not have bit 5 in the interrupt register’, it does not 
operate even if data is written to the interrupt register via the system interface. 


With the Vp4400, the user can select whether to use the timer interrupt, or the bit 
5 of the interrupt register, by using the bit 15 of the cause register. 


* This register cannot be directly written by the user via software. 


B.3.6 Kernel Physical Address Segment Configuration 


B.3.7 JTAG 
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The Vp4300 supports two algorithms (uncached and non-coherent) to maintain 
the coherency of the cache. While the Vp4400 supports a 36-bit physical address 
space, the Vp4300 supports a 32-bit physical address space. These two points 
affect the virtual address mapping of the Kernel physical address space segment 
(xkphys) that does not use the TLB. 


Both the Vp4400 and Vp4300 has eight address spaces in this segment, but the 
size of each area in these spaces is different between the Vp4400 and Vp4300. 
Each area in the address spaces of the Vp4400 is 64 GB, while that of the Vp4300 
is 4 GB. 


The Vp4300 conforms to IEEE149.1-1990. Consequently, the JTDO signal 
becomes active in the shift IR and shift DR modes. 


Because the Vp4400 conforms to the previous version of the IEEE149.1, the 
JTDO signal is not driven. 
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Table B-3 Other Differences 


Differences from Vp4400 


* 100 MHz model of the Vp4300 and the Vp4305 only 
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Product Name 
Item VR4300 VR4000 VR4400 
Instruction cache size 16 KB 8 KB 16 KB 
Data cache size 8 KB 8 KB 16 KB 
TLB TLB size 32 entries 48 entries 
Interaction between | TLB operation is corrected | TLB invalid exception 
IMT and TLB occurs 
manipulations 
Floating-point | Data path Shared with integer Processed by dedicated 
operation operation pipeline pipeline 
Instruction All multi-cycle Each multi-cycle 
execution time instructions are executed instruction is executed in 
in 1 cycle when source the same number of cycles 
exception occurs. regardless of whether 
exception occurs. 
Cvt.[s, d].I All bits 63 to 55 are 1 or 0 | All bits 63 to 52 are 1 or 0 
instruction 
(checking of 
floating-point 
unimplemented 
operation exception) 
Effect of RP bit Reduces operating Does not affect operating 
frequency to 1/4" frequency 
Pipeline 5 stages 8 stages 
Basic pipeline Super pipeline 
Interrupt Cause register Dedicated to timer Selectable by user 
(bit 15) interrupt 
Interrupt register None 
(bit 5) 
Kernel physical Physical 32 bits 36 bits 
address segment address space 
configuration supported 
(kp hys) Validaddress | 8 5 
space 
JTAG JTDO active in shift IR JTDO not driven in shift 
and shift DR modes IR and shift DR modes 
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Differences from Vp4200 


The Vp4300 is slightly different from the Vp4200 in terms of system design and 
software. This Appendix describes the differences between the Vp4300 and 
VR4200. 


The major differences are that the Vp4300 employs a new 32-bit system interface 
and deletes the data check function by parity. 
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C.1 Differences in Software 


The logical differences in software are the specifications of the CPO registers. 
These differences are shown in Table C-1. 


C.1.1 Cache Parity 


Because the Vp4300 does not check the cache data by using a parity, the Cache 
Error register (27) always outputs 0, and writing this register is ignored. The 
Parity Error register (26) can be used for only self-diagnosis and cannot be used 
to manipulate the cache. 


C.1.2 Status Register 


The bit specifications of the Status registers are slightly different between the 
VR4300 and Vp4200. The CE and DE bits of the Status register of the Vp4300 
are used to manipulate the parity and do not affect the operation. 


C.1.3 Config Register 
The bit specifications slightly differ. 


The BE bit and EP area of the Vp4200 set information on the external pins 
BigEndian and DataRate by hardware on reset which can be read by software. 


With the Vp4300, the default values are set to the BE bit and EP area at the time 
of cold reset. The default value of the EP area is 0000 and that of the BE bit is 1. 
After that, the values of these area and bit can be changed by software. Bits 18 
and 19 which are 00 with the Vp4200 are 01 with the Vp4300. 


For details, refer to 5.4.6 Config Register (16). 
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C.1.4 Cache Parity Error Exception 


Because the Vp4300 does not check data by using the cache parity, it does not 
generate the parity error exception. 


The Vp4200 generates the cache parity error exception (DCPE) in the WB stage. 


Table C-1 Differences in Software 


Product Name 
Function VR4300 VR4200 
Cache parity Not supported Supported 
Status register CE and DE bits do not Used to manipulate parity 
function 
Config register | BE bit and EP area Set default values Set information on 
external pins 
Bits 18 and 19 01 00 
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C.2 Differences in System Design 


Next, the differences in system design between the Vp4300 and Vp4200 are 
described. Table C-2 shows these differences. 


C.2.1 System Interface 


C.2.2, Clock 
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The system interface of the Vp4200 is a 64-bit bus with a parity check function, 
but that of the Vp4300 is a 32-bit bus without a parity check function. For details, 
refer to Chapter 12 System Interface. 


During block write of an instruction, the Vp4200 executes doubleword data 
transfer four times with one idle cycle. The Vp4300 executes word data transfer 
eight times to write the main memory. 


During block write of data, the Vp4200 executes doubleword data transfer two 
times. The Vp4300 executes word data transfer four times to write the main 
memory. 


The Vp4200 has two data rates, “DDx” and “Dxx”. The Vp4300 also has two data 
rates, “D” and “Dxx”. The Vp4200 can set a data rate by using the DataRate pin. 
The data rate of the Vp4300 is set by software, by using the EP area of the config 
register. The table below shows the transfer data patterns in the EP area. 


EP Area Transfer Pattern 
0000 D 


0110 DxxDxx 


The Vp4300 does not output the MasterOut and RClock signals. 


The frequency of the pipeline clock (PClock) of the Vp4400 and Vp4200 is 
usually two times faster than MasterClock. The Vp4300 can change the 
frequency ratio by using the value of DivMode(1:0)" pins. (Refer to Table 2-2 
Clock/Control Interface Signals.) The frequency ratio PClock:MasterClock 
can be selected from 2:1, 3:1, 4:1 or 3:2°”. The VR4200 usually generates SClock 
and TClock by dividing PClock by 2. The PClock of the Vp4300 is usually at 
the same frequency as MasterClock. 


In the low power mode’, the speeds of PClock, SClock, and TClock of the 
VR4300 can be reduced to the 1/4 of the normal level like the Vp4200. 


*1. In Vp4300 and Vp4305. In Vp4310, DivMode(2:0). 
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*2. In Vp4300. In Vp4305, the frequency ratio can be set to 1:1, 2:1, or 3:1. In 
VR4310, it can be set to 2:1, 3:1 4:1, 5:1, 6:1, or 5:2. 


*3. 100 MHz model of the Vp4300 and the Vp4305 only 


C.2.3 Package 


The Vp4200 employs a 208-pin plastic QFP. The Vp4300 is housed in a 120-pin 
plastic QFP. 


Table C-2 Differences in System Design 


Product Name 
Function VR4300 VR4200 
System SysAD bus No parity, 32 bits With parity, 64 bits 
interface Instruction Word data, 8 times Doubleword data, 4 times 
block write 
Data block Word data, 4 times Doubleword data, 2 times 
write 
Data pattern Set by config register Set by external pins 
(D, Dxx) (DDx, Dxx) 
Clock MasterOut, Not output Output 
RClock 
PClock Frequency ratio to Frequency two times higher 
MasterClock variable than normal MasterClock 
TClock Same frequency as normal PClock divided by two 
MaterClock 
Package 120-pin plastic QFP 208-pin plastic QFP 


C.3 Other Differences 


In addition to the above differences, the Vp4300 and Vp4200 differ in the 
following points. The differences described in this section are summarized in 
Table C-3. 


C.3.1 Physical Address 


The physical address and address space of the Vp4200 are 33 bits wide, and those 
of the Vp4300 are 32 bits wide. Consequently, the tag of the cache and the page 
frame number area of the TLB entry are 20 bits each at Hi and Lo sides. 
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C.3.2 Write Buffer 


The write buffer of the Vp4200 is a doubleword buffer with two entries. The 
VR4300 has a 4-entry word buffer to improve the performance during uncache 
write. 


C.3.3 Reset 


The Vp4200 simultaneously asserts the ColdReset and Reset signals active. 
These signals of the Vp4300 need not to be asserted active at the same time. The 
Reset signal of the Vp4300 may be active or inactive during cold reset. However, 
do not change the value of this signal during reset sequence. The ColdReset signal 
of the Vp4300 needs not to be synchronized with the MasterClock signal. 


C.3.4 Status(3:0) Pins 
The Status(3:0) pins provided to the Vp4200 are not provided to the Vp4300. 


With the Vp4300, when the ITS bit of the status register is set, an instruction cache 
miss occurs when a branch instruction is executed, and the branch destination 
address is output to SysAD(31:0). However, because the Vp4300 does not have 
Status(3:0) pins, the internal status of the processor cannot be output. 


Table C-3 Other Differences 


Product Name 

Pancdon VR4300 VR4200 
Physical address 32 bits 33 bits 
Write buffer 4-entry 2-entry 

Word buffer Doubleword buffer 
ColdReset signal and Need not to be synchronized Must be synchronized 
MasterClock 
Status (3:0) pins Not provided Provided 
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An unimplemented operation exception will occur in response to the execution of 
a type conversion instruction of the floating-point operation instruction in the 
following cases. 


e If an overflow occurs during conversion to integer format 
e If the source operand is an infinite number 


e If the source operand is NaN 


The type conversion instructions affected by this restriction are as follows. 


CEIL.L.fmt — fd, fs FLOOR.L.fmt fd, fs 
CEIL.W.fmt fd, fs FLOOR.W.fmt _ fd, fs 
CVT.D.fmt fd, fs ROUND.L.fmt _ fd, fs 
CVT.L.fmt — fd, fs ROUND.W.fmt fd, fs 
CVT.S.fmt — fd, fs TRUNC.L.fmt _ fd, fs 
CVT.W.fmt fd, fs TRUNC.W.fmt fd, fs 
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A 

Address cycle ... 292 

Address error exception ... 186 
Address translation ... 125, 126 
Addressing ... 41 


B 


BadV Addr register ... 164 
Basic system clock ... 259 
BEV ... 256 

Block read request ... 289 
Block write request ... 289 
Bootstrap exception vector (BEV) ... 256 
Boundary scan ... 342 
Boundary scan register ... 346 
Branch address ... 78 

Branch delay ... 94 

Branch instruction ... 77, 369 
Breakpoint exception ... 192 
Bus error exception ... 190 
Bus mastership ... 313, 328 
Bypass ... 119 

Bypass register ... 345 


C 


Cache error register ... 178 
CACHE instruction ... 112, 305 
Cache line ... 275, 283 

Cache line replacement ... 280, 282 
Cache memory ... 273 

Cache operation ... 279 

Cache state transition ... 283 
Cache states ... 283 

Cause register ... 171 

Clock generator ... 35 

Clock interface ... 257 
Clock-to-Q delay ... 258 


CMOS discrete device ... 269 
Code compatibility ... 119 

Cold reset ... 248, 250 

Cold reset exception ... 183 
Command ... 328 

Compare instruction ... 227 
Compare register ... 165 
Computational instruction ... 68, 226 
Config register ... 151 

Context register ... 163 
Control/status register ... 211 
Convert instruction ... 224 

COp ... 112 

Coprocessor 0 (CPO) ... 35 
Coprocessor instruction ... 83, 369 
Coprocessor unusable exception ... 193 
Count register ... 164 

CPO... 35 

CPOI... 113 

CPO bypass interlock ... 113 

CPO register ... 146 

CPU instruction ... 370 

CPU instruction set ... 39, 59, 363 
CPU register ... 37 


D 


Data cache ... 36, 277, 283 
Data cache addressing ... 278 
Data cache busy ... 111 

Data cache miss ... 111 

Data cache read request ... 290 
Data cycle ... 292 

Data format ... 41 

Data identifier ... 333, 337 
Data load miss ... 281 

Data store miss ... 281 

DCB ... 111 

DCM... 111 
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Defining Access Types ... 62 
Discarding command ... 325 


Divide-by-zero exception ... 241 


E 

Endianness ... 331 

EntryHi register ... 148 

EntryLo register ... 148 

EPC register ... 174 

Error EPC register ... 179 

Exception ... 103, 106, 180 

Exception processing ... 159, 200, 237 
Exception processing register ... 161 
Exception program counter register ... 174 
Exception vector location ... 180 
Execution time ... 230 

Execution unit ... 35 

External agent ... 268 

External arbitration ... 297, 313 

External normal interrupt ... 353 

External request ... 294, 298, 302, 306, 312 
External write request ... 303, 316 


F 

FCR ... 211 

FCRO ... 216 

FCR31 ... 211 

Fetch miss ... 304 

FGR ... 208 

Fixed-point format ... 220 

Flag ... 238 

Floating-point computational instruction ... 226, 555 
Floating-point control register ... 211 
Floating-point exception ... 235 
Floating-point format ... 217 

Floating-point general purpose register ... 208 
Floating-point load instruction ... 221, 553 


Floating-point register ... 210, 255 
Floating-point store instruction ... 221, 553 
Floating-point transfer instruction ... 221 
Floating-point unit ... 47, 207 

Flow control ... 311, 330 

FPR ... 210 

FPU branch instruction ... 229 

FPU instruction ... 221, 558 

FPU instruction set ... 547 


G 
Gate array ... 266 


H 
Handshake signal ... 295 


Hardware interrupt ... 356 
Hazard of CPO ... 162 


I 
ICB ... 108 
IE ... 256 


TEEE754 exception ... 244 
Implementation/Revision register ... 216 
Independent transfer ... 331 

Index register ... 146 

Inexact exception ... 240 

Initialization interface ... 247 
Instruction address ... 36 

Instruction cache ... 35, 276, 283 
Instruction cache addressing ... 278 
Instruction cache busy ... 108 
Instruction cache read request ... 289 
Instruction-dependent exception ... 115 
Instruction format ... 60 
Instruction-independent exception ... 114 
Instruction micro-TLB ... 49 


Instruction pipeline ... 49 
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Instruction register ... 344 
Instruction TLB miss ... 107 
Instruction trace support ... 168, 256 
Integer overflow exception ... 196 
Interface bus ... 291 

Interlock ... 103, 106 

Internal cache ... 47 

Interrupt ... 351 

Interrupt enable (IE) ... 168, 256 
Interrupt exception ... 199 
Interrupt request signal ... 354 
Invalid operation exception ... 240 
Inverting endian ... 170 


Issue cycle ... 293 


ITLB ... 49 
ITM. ... 107 

J 

Joint TLB ... 48 
JTAG ... 341 
JTLB ... 48 


Jump instruction ... 77, 369 


K 
Kernel address space ... 169 


Kernel extended addressing mode ... 255 


Kernel mode ... 133 


L 

LDI... 110 

LLAddr register ... 154 

Load delay ... 95 

Load delay slot ... 61 

Load instruction ... 61, 367, 553 
Load interlock ... 110 

Load miss ... 304 

Low power mo¢de ... 254, 264, 360 
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M 

Master state ... 296 
MasterClock ... 259, 263 
MCI... 109 

Memory hierarchy ... 274 
Memory management system ... 48, 121 


Multicycle instruction interlock ... 109 


N 
NaN ... 218 
NMI... 352 


NMI exception ... 185 
Non-maskable interrupt (NMJ ... 352 
Normal power mode ... 254, 360 


Number of delay cycles ... 233 


O 

Opcode bit encoding ... 544, 613 
Operating mode ... 49, 127, 169 
Operation during no branch ... 78 


Overflow exception ... 242 


P 
PageMask register ... 148, 149 


Parity error register ... 178 
PClock ... 259 

Phase-locked loop (PLL) ... 263 
Phase-locked system ... 265, 266 
Physical address ... 123, 289 

Pin configuration (Top View) ... 52 
Pin function ... 51, 54 

Pipeline ... 36, 89 

Pipeline exception ... 114 

PLL ... 263 

PLL passive element ... 615 
Power-ON reset ... 248, 249 
Privilege mode ... 255 
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Processor read request ... 301, 306 Status register ... 165 

Processor request ... 293, 298, 306 Status on reset ... 170 

Processor revision identifier register ... 151 Store delay slot ... 61 

Processor write request ... 301, 309 Store instruction ... 61, 367, 553 

Power mode ... 254 Store miss ... 304 

Power off mode ... 255, 361 Successive processing of request ... 321 
Precision of exception ... 161 SyncIn/CyncOut ... 259 

Priority (exception) ... 182 System call exception ... 191 

Priority (exception and interlock) ... 116 System control coprocessor (CPO) ... 44, 142 
PRId register ... 151 System control coprocessor (CPO) 


instruction ... 86, 370 
System event ... 299 
System interface ... 35, 289, 296 


System interface address ... 339 


R 

Random register ... 147 

Read command ... 327 

Read request ... 334 

Read response ... 303, 313, 317, 330 


Re-executing command ... 325 


System interface cycle time ... 332 


System timing parameter ... 263 


Release latency time ... 332 T 

Request control ... 300, 302 TagHi register ... 154 

Request issuance ... 300, 302 TagLo register ... 154 

Reserved instruction exception ... 194 TAP... 347 

Reverse endianness ... 256 TAP controller ... 348 
TClock ... 260 


Test access port ... 347 


S 


Timer interrupt ... 354 


Saving and returning ... 244 TLB ... 48, 122 

SClock ... 260, 263 TLB entry ... 143 

Sequential ordering ... 339 TLB exception ... 187 

Slave state ... 298 TLB invalid exception ... 188 
Soft reset ... 248, 251 TLB instruction ... 158 

Soft reset exception ... 184 TLB miss ... 158 


Software interrupt ... 354 TLB miss exception ... 187 
Special instruction ... 81 
Subblock ordering ... 339 


Supervisor address space ... 169 


TLB modification exception ... 189 
Translation lookaside buffer ... 48, 122 
Transmission time ... 268 


Supervisor extended addressing mode ... 255 Trap exception ... 195 


Supervisor mode ... 129 
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U 


ncached area ... 305 
ncompelled change to slave state ... 298 


nderflow exception ... 242 


ser address space ... 169 


ser extended addressing mode ... 255 


U 
U 
U 
Unimplemented operation exception ... 243 
U 
U 
U 


ser mode ... 127 


V 
Virtual address ... 124 
Virtual address translation ... 155 


Ww 

Watch exception ... 198 
WatchHi register ... 175 
WatchLo register ... 175 
Wired register ... 150 
Write buffer ... 120 
Write command ... 325 


Write request ... 330, 336 


xX 


XContext register ... 176 
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